資源簡介
Python爬蟲,爬取136書屋的小說beautifulsoup4.py
使用beautifulsoup4包進行html和xml的解析,使用urllib打開和操作網址
使用前請先安裝beautifulsoup4和urllib包,本示例使用的是Python2.7
代碼片段和文件信息
#coding=utf-8
from?urllib?import?URLopener
from?bs4?import?BeautifulSoup?as?BS
import?os
import?sys
if?__name__?==?‘__main__‘:
????Bfolder?=?r“D:\LILUO\6.MyTools\12.beautifulsoup4\books“
????
????url?=?“http://www.136book.com/“
????html?=?URLopener().open(url)
????soup?=?BS(html.read()?“html.parser“)
????
????a?=?soup.find_all(name=‘a‘)
????BookDict?=?{}
????for?each?in?a:
????????if?“http://www.136book.com/“?in?each.get(‘href‘):
????????????if?each.get(‘title‘):
????????????????BookDict[each.get(‘href‘)]?=?each.get(‘title‘)
????html.close()
????for?burl?in?BookDict:
????????#burl?=?“http://www.136book.com/zetianji/“
????????bhtml?=?URLopener().open(burl)
????????bsoup?=?BS(bhtml.read()?“html.parser“)
????????ba?=?bsoup.find_all(name=‘a‘)
????????path?=?Bfol
- 上一篇:openmv識別特定顏色且打印坐標到串口
- 下一篇:Python 凸包算法
評論
共有 條評論