91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 0M
    文件類(lèi)型: .py
    金幣: 1
    下載: 0 次
    發(fā)布日期: 2021-05-27
  • 語(yǔ)言: Python
  • 標(biāo)簽: 其他??

資源簡(jiǎn)介

wenku_test.py

資源截圖

代碼片段和文件信息

#Python3.5
#2018/2/14
#參考教程:http://blog.csdn.net/c406495762/article/details/72331737#31-selenium
#待改進(jìn):中文字體;代碼的通用性;無(wú)法爬取圖片
?
from?selenium?import?webdriver??#webdriver用來(lái)打開(kāi)網(wǎng)頁(yè)
from?bs4?import?BeautifulSoup???#用來(lái)爬取內(nèi)容
import?time?????#用來(lái)等待完全加載
from?docx?import?Document???????#新建文檔
from?docx.enum.text?import?WD_ALIGN_PARAGRAPH???#用來(lái)居中顯示標(biāo)題????
?
def?find_doc(driver?i):
????time.sleep(3)
????html?=?driver.page_source
????soup1?=?BeautifulSoup(html?‘html.parser‘)
?
????result?=?soup1.find(‘div‘?attrs?=?{‘class‘:‘doc-title‘}?)
????doc_title?=?result.get_text()???###得到文檔標(biāo)題
?
????try:
????????elem?=?driver.find_element_by_xpath(“//div[@data-flod-fun=‘continue-read‘]“)
????????elem.click()
????????global?doc_content_list
????????doc_content_list?=?[]
????except:
????????pass
?
????result2?=?soup1.find_all(‘p‘?attrs?=?{‘class‘:‘txt‘}?)
????for?each?in?result2:
????????text2?=?each.get_text()
?????????
????????if?‘????????????‘?in?text2:
????????????text3?=?text2.replace(?‘????????????‘?‘‘?)
????????else:
????????????text3?=?text2
?????????????
????????doc_content_list.append(text3)??###得到正文內(nèi)容
?????????
????try:
????????elem?=?driv

評(píng)論

共有 條評(píng)論