91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

資源簡(jiǎn)介

python爬蟲(chóng)爬取微博熱搜

資源截圖

代碼片段和文件信息

#-*-?coding?=?utf-8?-*-
#@Time?:?2020/12/16?14:37
#@Author?:?wy
#@File?:?spider.py
#@Software?:?PyCharm

‘‘‘
實(shí)現(xiàn)思路
1.頁(yè)面分析,找到頁(yè)面的url,找到數(shù)據(jù)的位置
2.數(shù)據(jù)抓取,通過(guò)request庫(kù)的get請(qǐng)求拿到html源碼
3.數(shù)據(jù)解析,通過(guò)lxml庫(kù)的xpath語(yǔ)法提取所需要的數(shù)據(jù)
4.數(shù)據(jù)存儲(chǔ),使用with?open將數(shù)據(jù)進(jìn)行寫入
‘‘‘

#引入第三方庫(kù),需要安裝

import?requests????????????????#數(shù)據(jù)抓取庫(kù)
from?lxml?import?etree?????????#數(shù)據(jù)解析庫(kù)
import?time????????????????????#內(nèi)置函數(shù),時(shí)間庫(kù)

#時(shí)間格式化
today?=?time.strftime(
????‘%Y{y}%m{m}%dkdbyba7‘time.localtime()).format(y=‘年‘m=‘月‘d=‘日‘)
print(today)

#數(shù)據(jù)抓取
url?=?“https://s.weibo.com/top/summary?cate=realtimehot“?????#熱搜地址
headers?={
“User-Agent“:?“Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML?like?Gecko)?Chrome/87.0.4280.88?Safari/537.36?Edg/87.0.664.60“

}????????#headers偽裝頭
response?=?requests.get(urlheaders=headers)????????#發(fā)送請(qǐng)求
#print(response.text)????#獲取html源碼

#數(shù)據(jù)解析
html?=?etree.HTML(response.text)??????#類型轉(zhuǎn)換

#先找到上一級(jí)標(biāo)簽,然后在下面進(jìn)行多次提取,使用for循環(huán)
datas?=?html.xpath(‘//*[@id=“pl_top_realtimehot“]/table/tbody/tr‘)???????#由一定的路徑
for?data?in?datas:????#循環(huán)多次提取
????data_title?=?‘‘.join(data.xpath(‘td[2]/a/text()‘))?????#熱搜標(biāo)題
????data_rank?=?‘‘.join(data.xpath(‘td[1]/text()‘))????????#熱搜排名
????data_num?=?‘‘.join(data.xpath(‘td[2]/span/text()‘))
????print(data_rankdata_titledata_num)

????#數(shù)據(jù)存儲(chǔ),文件名是當(dāng)天的日期
????with?open(“./20201228‘.txt‘“‘a(chǎn)‘encoding=‘utf-8‘)as?f:
????????f.write(“%s\t%s%s\n“%(data_rankdata_titledata_num))


?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----

?????文件????????184??2020-12-16?14:39??weibo\.idea\.gitignore

?????文件????????174??2020-12-16?14:39??weibo\.idea\inspectionProfiles\profiles_settings.xml

?????文件????????410??2020-12-16?14:39??weibo\.idea\inspectionProfiles\Project_Default.xml

?????文件????????302??2020-12-16?14:39??weibo\.idea\misc.xml

?????文件????????269??2020-12-16?14:39??weibo\.idea\modules.xml

?????文件????????361??2020-12-16?14:39??weibo\.idea\weibo.iml

?????文件???????6060??2020-12-28?23:33??weibo\.idea\workspace.xml

?????文件???????1870??2020-12-28?23:33??weibo\20201228‘.txt‘

?????文件???????1819??2020-12-28?23:33??weibo\spider.py

?????文件???????2176??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\AUTHORS

?????文件???????1315??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\COPYING.txt

?????文件??????????4??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\INSTALLER

?????文件???????1447??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\LICENSE

?????文件???????4190??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\metaDATA

?????文件???????3121??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\RECORD

?????文件??????????0??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\REQUESTED

?????文件??????????4??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\top_level.txt

?????文件?????????92??2020-12-16?14:58??weibo\venv\Lib\site-packages\beautifulsoup4-4.9.3.dist-info\WHEEL

?????文件??????18748??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\_html5lib.py

?????文件??????18405??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\_htmlparser.py

?????文件??????12234??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\_lxml.py

?????文件??????19777??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\__init__.py

?????文件??????12476??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\__pycache__\_html5lib.cpython-39.pyc

?????文件??????12968??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\__pycache__\_htmlparser.cpython-39.pyc

?????文件???????9418??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\__pycache__\_lxml.cpython-39.pyc

?????文件??????15293??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\builder\__pycache__\__init__.cpython-39.pyc

?????文件??????34130??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\dammit.py

?????文件???????7755??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\diagnose.py

?????文件??????81650??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\element.py

?????文件???????5654??2020-12-16?14:58??weibo\venv\Lib\site-packages\bs4\formatter.py

............此處省略1722個(gè)文件信息

評(píng)論

共有 條評(píng)論