資源簡介
PageRank分值計算 Python爬蟲 數據挖掘實驗 華南理工大學
代碼片段和文件信息
#?-*-?coding:utf-8?-*-
import?urllib
import?urllib2
import?re
from?bs4?import?BeautifulSoup
import?random
import?time
class?soider1:
????def?__init__(self):
????????self.siteURL?=?‘http://blog.csdn.net/v_july_v/article/list/‘
????????self.URLHEAD?=?‘http://blog.csdn.net‘
????????self.cnt?=?0
????????self.Max_search?=?0
????????self.url_map_num?=?{}??#?each?url?reflect?a?integerA
????????self.url_map_num_array?=?{}??#?each?integerA?reflect?map_indexB
????????#?self.url_map_array?=?[]??#?interact?url‘integer??and?indexB?map
????def?getPage(self?url_tail):
????????url?=?self.URLHEAD?+?str(url_tail)
????????self.Max_search?+=?1
????????user_agents?=?[
????????????‘Mozilla/5.0?(Windows;?U;?Windows?NT?5.1;?it;?rv:1.8.1.11)?Gecko/20071127?Firefox/2.0.0.11‘
????????????‘Opera/9.25?(Windows?NT?5.1;?U;?en)‘
????????????‘Mozilla/4.0?(compatible;?MSIE?6.0;?Windows?NT?5.1;?SV1;?.NET?CLR?1.1.4322;?.NET?CLR?2.0.50727)‘
????????????‘Mozilla/5.0?(compatible;?Konqueror/3.5;?Linux)?KHTML/3.5.5?
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件??????68618??2016-05-22?18:51??代碼及相應文件\all_keys.txt
?????文件?????105111??2016-05-22?18:51??代碼及相應文件\all_keys_url.txt
?????文件???????7350??2016-05-22?18:55??代碼及相應文件\csdnspider.py
?????文件?????994071??2016-05-22?18:51??代碼及相應文件\iterator.txt
?????文件?????233984??2016-05-22?19:17??實驗?駱明楠?201330551358.doc
?????目錄??????????0??2016-05-22?19:19??代碼及相應文件
-----------?---------??----------?-----??----
??????????????1409134????????????????????6
- 上一篇:QSoft RAMDisk中文正式版64
- 下一篇:教育類的連線題Demo
評論
共有 條評論