最近回顾了python爬虫,学习了如何抓包(Firefox、Fiddler)。同时也把python中用requests库爬取网页的脚本模板记录在此,代码源自mooc中北理的相关网课:
requests库
1 2 3 4 5 6 7 8 9 10 11
| import requests url = "...."
try: kv = {"user-agent":"Mozilla/5.0"} r = requests.get(url,headers = kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[1000:2000]) except: print("爬取失败")
|