Python爬蟲--抓取百度百科的前1000個頁面

大小: 693B

文件類型: .rar

金幣: 2

下載: 1 次

發布日期: 2021-06-15
語言: Python
標簽: Python爬蟲??百度百科??

高速下載

資源簡介

Python爬蟲--抓取百度百科的前1000個頁面的實現。

資源截圖

小圖大圖

代碼片段和文件信息

#?coding:UTF8

from?bs4?import?BeautifulSoup
import?re

html_doc?=?“““
tle>The?Dormouse‘s?storytle>

tle“>The?Dormouse‘s?story


Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were
nk1“>Elsie
nk2“>Lacie?and
nk3“>Tillie;
and?they?lived?at?the?bottom?of?a?well.


...

“““
soup?=?BeautifulSoup（html_doc?‘html.parser‘?from_encoding=‘utf-8‘）
print?‘get?all?links‘
links?=?soup.find_all（‘a‘）
for?link?in?links:
????print?link.name?link[‘href‘]?link.get_text（）
????
print?‘\nget?lacie?link‘
link_node??=?soup.find（‘a‘?href=“http://example.com/lacie“）
print?link_node.name?link_node[‘href‘]?link_node.get_text（）

print?‘\nre‘
link_node??=?soup.find（‘a‘?href=re.compile（r“ill“））
print?link_node.name?link_node[‘href‘]?link_node.get_text（）

print?‘\np‘
p_node??=?soup.find（‘p‘?class_=“title“）
print?p_node.namep_node.get_text（）

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件???????1161??2016-10-30?13:31??reptile\test_bs4.py

?????文件??????????0??2016-10-30?13:20??reptile\__init__.py

?????目錄??????????0??2016-10-30?13:21??reptile

-----------?---------??----------?-----??----

?????????????????1161????????????????????3

91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

Python爬蟲--抓取百度百科的前1000個頁面

資源簡介

資源截圖

代碼片段和文件信息

評論

相關資源