91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 111KB
    文件類型: .rar
    金幣: 2
    下載: 0 次
    發布日期: 2021-05-15
  • 語言: Python
  • 標簽: python爬蟲??

資源簡介

新浪爬蟲的python代碼以及部分結果整理 文件列表 1. spider_try.py 爬蟲主程序,采用抓取html源碼解析的方式獲取用戶信息。針對每個用戶按照person類定義解析。 2. person.py 定義person類,將相應的html標簽段解析為可讀形式 3. format.py 將最終的結果輸出為gexf標準格式方便圖處理

資源截圖

代碼片段和文件信息

#?-*-?coding:?utf-8?-*-
“““
Created?on?Fri?Jun??1?11:12:21?2018

@author:?gaoruiyuan
“““


import?re

biglist?=?[]
normallist?=?[]
node_data?=?“./html_follow_name/node.txt“
nodefile?=?open(node_data“w“?encoding=“UTF-8“)
edge_data?=?“./html_follow_name/edge.txt“
edgefile?=?open(edge_data“w“?encoding=“UTF-8“)
edgenum?=?0

def?file_ana(f):
????global?edgenum
????content?=?f.read().decode(‘utf-8‘)
#print?(content)
????host_name?=?re.findall(r“=?(.+?)\r\n“?content)
????host_name?=?host_name[0]
????if?host_name?not?in?normallist:
????????from_id?=?str(?10000?+?len(normallist))
????????normallist.append(host_name)
????????nodefile.write(“\n“)
????else:
????????from_id?=?str(normallist.index(host_name)?+?10000)
????biglist_read?=?re.findall(r“\n(.+?)\tbig\r“?content)
????normallist_read?=?re.findall(r“\n(.+?)\tnormal\r“?content)
????for?i?in?biglist_read:
????????if?i?not?in?biglist:
????????????nodefile.write(“\n“)
????????????biglist.append(i)
????????id_to?=?str(biglist.index(i))
????????edgefile.write(“\n\n\n“)
????????edgenum?+=?1
????for?i?in?normallist_read:
????????if?i?not?in?normallist:
????????????nodefile.write(“\n“)
????????????normallist.append(i)
????????id_to?=?str(normallist.index(i))
????????edgefile.write(“\n\n\n\n“)
????????edgenum?+=?1
????f.close
????return;
????
for?i?in?range(1100):
????print(i)
????file_data?=?“./html_follow_name/“?+?str(i)?+?“follow.txt“
????f?=?open(file_data“rb“)
????file_ana(f)
nodefile.close()
edgefile.close()

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件???????1958??2018-06-03?12:58??爬蟲\format.py

?????文件???????2285??2018-05-21?10:20??爬蟲\person.py

?????文件???????1344??2018-06-03?13:27??爬蟲\Readme.md

?????文件???????4780??2018-06-01?10:47??爬蟲\spider_try.py

?????目錄??????????0??2018-06-03?13:36??爬蟲

?????文件????????179??2018-06-01?10:53??single_results\10follow.txt

?????文件????????431??2018-06-01?10:53??single_results\11follow.txt

?????文件????????491??2018-06-01?10:55??single_results\12follow.txt

?????文件????????363??2018-06-01?10:55??single_results\13follow.txt

?????文件????????972??2018-06-01?10:55??single_results\14follow.txt

?????文件????????475??2018-06-01?10:56??single_results\15follow.txt

?????文件?????????80??2018-06-01?10:56??single_results\16follow.txt

?????文件????????479??2018-06-01?10:58??single_results\17follow.txt

?????文件????????158??2018-06-01?10:58??single_results\18follow.txt

?????文件????????379??2018-06-01?10:58??single_results\19follow.txt

?????文件???????2958??2018-06-01?10:48??single_results\1follow.txt

?????文件????????269??2018-06-01?10:59??single_results\20follow.txt

?????文件????????457??2018-06-01?11:00??single_results\21follow.txt

?????文件????????310??2018-06-01?11:00??single_results\22follow.txt

?????文件????????336??2018-06-01?11:01??single_results\23follow.txt

?????文件?????????48??2018-06-01?11:02??single_results\24follow.txt

?????文件????????638??2018-06-01?11:02??single_results\25follow.txt

?????文件????????413??2018-06-01?11:03??single_results\26follow.txt

?????文件????????371??2018-06-01?11:03??single_results\27follow.txt

?????文件????????155??2018-06-01?11:04??single_results\28follow.txt

?????文件?????????42??2018-06-01?11:04??single_results\29follow.txt

?????文件???????1030??2018-06-01?10:48??single_results\2follow.txt

?????文件?????????72??2018-06-01?11:05??single_results\30follow.txt

?????文件????????858??2018-06-01?11:05??single_results\31follow.txt

?????文件????????577??2018-06-01?11:06??single_results\32follow.txt

............此處省略82個文件信息

評論

共有 條評論