資源簡介
QQ空間爬蟲,包括日志、說說、個人信息等,一天可抓取 400 萬條數據。
代碼片段和文件信息
#?coding=utf-8
import?datetime
#?import?BitVector
import?public_methods
class?InitMessages(object):
????“““?功能:信息初始化(讀取保存在本地的信息,并設置爬蟲的各項參數)。?“““
????def?__init__(self):
????????self.myheader?=?{‘User-Agent‘:?‘Mozilla/5.0?(X11;?Ubuntu;?Linux?x86_64;?rv:39.0)?Gecko/20100101?Firefox/39.0‘
?????????????????????????‘Referer‘:?‘http://ctc.qzs.qq.com/qzone/newblog/blogcanvas.html‘}??#?表頭信息
????????self.thread_num_QQ?=?1??#?同時下載幾個QQ的日志,每個QQ的訪問使用不同的cookie登錄
????????self.thread_num_Blog?=?2??#?同時下載QQ的幾篇日志
????????self.thread_num_Mood?=?6??#?同時下載QQ的幾條說說
????????self.blog_after_date?=?datetime.datetime.strptime(“2014-01-01“?“%Y-%m-%d“)??#?爬這個時間之后的日志
????????self.mood_after_date?=?datetime.datetime.strptime(“2015-01-01“?“%Y-%m-%d“)??#?爬這個時間之后的說說
????????self.my_qq?=?self.readMyQQ()??#?我的QQ列表,用來登錄
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2016-11-25?04:33??QQSpider-master\
?????目錄???????????0??2016-11-25?04:33??QQSpider-master\BitVector模塊報錯解決\
?????文件????????4049??2016-11-25?04:33??QQSpider-master\BitVector模塊報錯解決\init_messages.py
?????文件????????4670??2016-11-25?04:33??QQSpider-master\BitVector模塊報錯解決\spide_controller.py
?????目錄???????????0??2016-11-25?04:33??QQSpider-master\QQSpider1\
?????目錄???????????0??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\
?????文件??????????10??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\.name
?????文件?????????551??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\QQ_spiders.iml
?????文件?????????159??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\encodings.xm
?????文件?????????718??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\misc.xm
?????文件?????????427??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\modules.xm
?????文件???????57608??2016-11-25?04:33??QQSpider-master\QQSpider1\.idea\workspace.xm
?????文件???????????0??2016-11-25?04:33??QQSpider-master\QQSpider1\QQFailSpided.txt
?????文件?????????155??2016-11-25?04:33??QQSpider-master\QQSpider1\QQForSpider.txt
?????文件???????????0??2016-11-25?04:33??QQSpider-master\QQSpider1\QQHadSpided.txt
?????文件????????7742??2016-11-25?04:33??QQSpider-master\QQSpider1\blog_spider.py
?????文件????????1295??2016-11-25?04:33??QQSpider-master\QQSpider1\friend_spider.py
?????文件???????10315??2016-11-25?04:33??QQSpider-master\QQSpider1\information_spider.py
?????文件?????????779??2016-11-25?04:33??QQSpider-master\QQSpider1\init.py
?????文件????????4043??2016-11-25?04:33??QQSpider-master\QQSpider1\init_messages.py
?????文件????????6804??2016-11-25?04:33??QQSpider-master\QQSpider1\mood_spider.py
?????文件??????????33??2016-11-25?04:33??QQSpider-master\QQSpider1\myQQ.txt
?????文件????????5797??2016-11-25?04:33??QQSpider-master\QQSpider1\public_methods.py
?????文件????????4422??2016-11-25?04:33??QQSpider-master\QQSpider1\spide_controller.py
?????文件??????????75??2016-11-25?04:33??QQSpider-master\QQSpider1\使用說明.txt
?????目錄???????????0??2016-11-25?04:33??QQSpider-master\QQSpider2\
?????文件??????????40??2016-11-25?04:33??QQSpider-master\QQSpider2\QQForSpider.txt
?????文件????????7092??2016-11-25?04:33??QQSpider-master\QQSpider2\blog_spider.py
?????文件????????1277??2016-11-25?04:33??QQSpider-master\QQSpider2\friend_spider.py
?????文件???????10013??2016-11-25?04:33??QQSpider-master\QQSpider2\information_spider.py
?????文件????????3281??2016-11-25?04:33??QQSpider-master\QQSpider2\init_messages.py
............此處省略7個文件信息
- 上一篇:python爬取貝殼網小區數據
- 下一篇:教室管理系統源碼(基于python)
評論
共有 條評論