資源簡介
Python爬蟲--抓取百度百科的前1000個頁面的實現。

代碼片段和文件信息
#?coding:UTF8
from?bs4?import?BeautifulSoup
import?re
html_doc?=?“““
tle>The?Dormouse‘s?story tle>
tle“>The?Dormouse‘s?story
Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were
nk1“>Elsie
nk2“>Lacie?and
nk3“>Tillie;
and?they?lived?at?the?bottom?of?a?well.
...
“““
soup?=?BeautifulSoup(html_doc?‘html.parser‘?from_encoding=‘utf-8‘)
print?‘get?all?links‘
links?=?soup.find_all(‘a‘)
for?link?in?links:
????print?link.name?link[‘href‘]?link.get_text()
????
print?‘\nget?lacie?link‘
link_node??=?soup.find(‘a‘?href=“http://example.com/lacie“)
print?link_node.name?link_node[‘href‘]?link_node.get_text()
print?‘\nre‘
link_node??=?soup.find(‘a‘?href=re.compile(r“ill“))
print?link_node.name?link_node[‘href‘]?link_node.get_text()
print?‘\np‘
p_node??=?soup.find(‘p‘?class_=“title“)
print?p_node.namep_node.get_text()
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件???????1161??2016-10-30?13:31??reptile\test_bs4.py
?????文件??????????0??2016-10-30?13:20??reptile\__init__.py
?????目錄??????????0??2016-10-30?13:21??reptile
-----------?---------??----------?-----??----
?????????????????1161????????????????????3
- 上一篇:nlp肯定句與否定句判斷
- 下一篇:python簡單實現-中國象棋
評論
共有 條評論