Python实战：延迟提交数据- Python

文章目录

- 1. 前言
- 2. 延迟提交时间

1. 前言

上一篇文章中我们可以通过修改User-Agent实现隐藏，可以算是最简单的方法了。不过如果这是一个用于抓取网页的爬虫（比如批量下载某些照片。。。），那么一个IP地址在短时间内连续进行网页访问，很明显是不符合普通人类的行为标准的，同时也会对服务器造成不小的压力。因此服务器只需要记录每个IP的访问频率，在单位时间之内，如果访问频率超过一个阈值，便认为该IP地址很可能是爬虫，于是可以返回一个验证码页面，要求用户填写验证码。如果是爬虫的话，当然不可能填写验证码，便可拒绝掉
目前可以有两种策略：
(1)延迟提交时间
(2)使用代理
本篇重点介绍延迟提交时间

2. 延迟提交时间

代码：

import urllib.request
import urllib.parse
import json
import time

url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
headers = {
    'Host': "fanyi.youdao.com",
    'Referer': "http://fanyi.youdao.com/?keyfrom=dict2.index", 'User-Agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
}
while True:
    content=input("Please enter the content(q！ stands for exit)：")
    if content=='q!':
        break
    data = {
        'i': content,
        'from': "AUTO",
        'to': "AUTO",
        'smartresult': "dict",
        'client': "fanyideskweb",
        'doctype': "json",
        'version': "https://cdn.jxasp.com:9143/image/2.1",
        'keyfrom': "fanyi.web",
        'action': "FY_BY_CLICKBUTTION",
        'typoResult': "false",
        }
    data = urllib.parse.urlencode(data).encode('utf-8')
    temp = urllib.request.Request(url,data,headers)
    response = urllib.request.urlopen(temp)
    html = response.read().decode('utf-8')
    target = json.loads(html)
    print("The result is: %s" % (target['translateResult'][0][0]['tgt']))
    #每次翻译成功之后，延迟5秒
    time.sleep(5)

关键词搜索

Python实战：延迟提交数据

详情内容

文章目录

1. 前言

2. 延迟提交时间

相关技术文章

最新源码

下载排行榜

提示信息

选择支付方式