資源簡(jiǎn)介
關(guān)于小說(shuō)的簡(jiǎn)易爬蟲,僅供大家進(jìn)行學(xué)習(xí)參考,有問(wèn)題可以一塊討論
代碼片段和文件信息
import?requests
from?bs4?import?BeautifulSoup
import?sys
import?time
class?download(object):
????def?__init__(self):
????????self.server?=?‘https://www.biqukan.com‘
????????self.target?=?‘https://www.biqukan.com/1_1094/‘
????????self.names?=?[]
????????self.nums?=?0
????????self.urls?=?[]
????????self.headers?=?{?‘Accept‘:?‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9*/*;q=0.8‘
????????????‘Accept-Encoding‘:?‘gzip?deflate?br‘
????????????‘Accept-Language‘:?‘zh-CNzh;q=0.8en-US;q=0.5en;q=0.3‘
????????????‘Connection‘:?‘keep-alive‘
????????????‘user-agent‘:?‘Mozilla/5.0?(Windows?NT?6.3;?WOW64)?AppleWebKit/537.36?(KHTML?like?Gecko)?Chrome/44.0.2403.157?Safari/537.36‘
????????????}
????“““
????函數(shù)說(shuō)明:獲取下載鏈接
????Parameter:
????????無(wú)
????Return:
????????無(wú)
????Modify:
????????2018-12-08
????“““
????def?get_download_url(self):
????????req?=?requests.get(self.targetheaders?=?self.headers)
????????html?=?req.text
????????div_bf?=?BeautifulSoup(html‘html5lib‘)
????????div?=?div_bf.find_all(‘div‘class_=‘listmain‘)
????????a_bf?=?BeautifulSoup(str(div[0])‘html5lib‘)
????????a?=?a_bf.find_all(‘a(chǎn)‘)
????????self.nums?=?len(a[15:])
????????for?each?in?a[15:]:
????????????if?each.string==“正文“?or?each.string==“正文卷“:
????????????????continue
????????????self.names.append(each.string)
????????????self.urls.append(self.server+each.get(‘href‘))
????“““
????函數(shù)說(shuō)明:獲取章節(jié)內(nèi)容
????Parmeters:
????????target?-?下載鏈接(string)
????Returns:
????????texts?-?章節(jié)內(nèi)容(string)
????Modify:
????????2018-12-08
評(píng)論
共有 條評(píng)論