-
大小: 3KB文件類(lèi)型: .py金幣: 1下載: 1 次發(fā)布日期: 2021-06-09
- 語(yǔ)言: Python
- 標(biāo)簽: 百度文庫(kù)??自動(dòng)下載??Python??
資源簡(jiǎn)介
該腳本實(shí)現(xiàn)了自動(dòng)下載百度文庫(kù)文檔,但缺點(diǎn)是需要企業(yè)賬號(hào),不能任意下載任意文檔,所以?xún)H供學(xué)習(xí)python腳本使用。使用方法是:運(yùn)行軟件,輸入要下載的文檔的地址,它就會(huì)自動(dòng)下載
代碼片段和文件信息
#?coding:utf-8
import?io
import?re
import?sys
import?json
import?requests
from?lxml?import?etree
sys.stdout?=?io.TextIOWrapper(sys.stdout.bufferencoding=“utf8“)
class?BaikuSpider(object):
????“““docstring?for?BaikuSpider“““
????def?__init__(self):
????????self.url?=?“https://wenku.baidu.com/view/6b2016c49ec3d5bbfd0a742d.html“
????????self.headers?=?{“User-Agent“:?“Mozilla/5.0?(Windows?NT?10.0;?WOW64)?AppleWebKit/537.36?(KHTML?like?Gecko)?Chrome/67.0.3396.99?Safari/537.36“
????????????????????????“Origin“:?“https://wenku.baidu.com“
????????????????????????“Referer“:?“https://wenku.baidu.com/view/c20e5ad684254b35eefd3402.html“
????????????????????????}
????????self.downloadUrl?=?““
????????self.data?=?{}
????????self.name?=?““
????
????def?parse_url(selfurl):
????????response?=?requests.get(urlheaders=self.headers)
????????return?response.content.decode(“gb2312““ignore“)
????def?init_post(selfstr_content):
????????html?=?etree.HTML(str_content)
????????doc_id?=?re.search(r‘view/(\S*)\.html$‘self.url).group(1)
????????downloadToken?=?html.xpath(“//form/input[@name=‘downloadToken‘]/@value“)
????????sz?=?html.xpath(“//form/input[@name=‘sz‘]/@value“)
????????storage?=?html.xpath(“//form/input[@name=‘storage‘]/@value“)
????????retType?=?html.xpath(“//form/input[@name=‘retType‘]/@value“)
????????ct?=?html.xpath(“//form/input[@name=‘ct‘]/@value“)
????????useTicket?=?html.xpath(“//form/input[@name=‘useTicket‘]/@value“)
????????target_uticket_num?=?html.xpath(“//form/input[@name=‘target_uticket_num‘]/@value“)
????????v_code?=?html.xpath(“//form/input[@name=‘v_code
評(píng)論
共有 條評(píng)論