91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 0.12M
    文件類型: .rar
    金幣: 1
    下載: 0 次
    發布日期: 2024-05-07
  • 語言: Python
  • 標簽: 軟件??爬蟲??

資源簡介

CNKICrawler
# 中國知網爬蟲
## 可以爬取文章的全部信息,包括文章名、作者、作者單位、被引用次數、下載次數、文章來源、文章關鍵詞、文章摘要、文章參考文獻,文章詳情頁面url
## 需要安裝的工具有:BeautifulSoup, xlwt

### 使用python3.x
### 運行spider_main.py即可,生成的結果文件為:data_out.xls,配置文件為Config.conf

資源截圖

代碼片段和文件信息

#?-*-?coding:?utf-8?-*-
from?configparser?import?ConfigParser
from?urllib.parse?import?quote
import?socket
import?os
import?math
import?urllib.request
from?bs4?import?BeautifulSoup
import?time
import?spider_search_page
import?spider_paper

if?__name__?==?‘__main__‘:
????start?=?time.clock()
????cf?=?ConfigParser()
????cf.read(“Config.conf“?encoding=‘utf-8‘)
????keyword?=?cf.get(‘base‘?‘keyword‘)#?關鍵詞
????maxpage?=?cf.getint(‘base‘?‘maxpage‘)#?最大頁碼
????searchlocation?=?cf.get(‘base‘?‘searchlocation‘)?#搜索位置
????currentpage?=?cf.getint(‘base‘?‘currentpage‘)
????if?os.path.exists(‘data-detail.txt‘)?and?currentpage?==?0:
????????print(‘存在輸出文件,刪除該文件‘)
????????os.remove(‘data-detail.txt‘)

????#構造不同條件的關鍵詞搜索
????values?=?{
???????????‘全文‘:?‘qw‘
???????????‘主題‘:?‘theme‘
???????????‘篇名‘:?‘title‘
???????????‘作者‘:?‘author‘
???????????‘摘要‘:‘abstract‘
????}
????keywordval?=?str(values[searchlocation])+‘:‘+str(keyword)
????index_url=‘http://search.cnki.com.cn/Search.aspx?q=‘+quote(keywordval)+‘&rank=&cluster=&val=&p=‘#quote方法把漢字轉換為encodeuri?
????print(index_url)

????#獲取最大頁數
????html?=?urllib.request.urlopen(index_url).read()
????soup?=?BeautifulSoup(html?‘html.parser‘)
????pagesum_text?=?soup.find(‘span‘?class_=‘page-sum‘).get_text()
????maxpage?=?math.ceil(int(pagesum_text[7:-1])?/?15)
????#print(maxpage)
????cf?=?ConfigParser()
????cf.read(“Config.conf“?encoding=‘utf-8‘)
????cf.set(‘base‘?‘maxpage‘?str(maxpage))
????cf.write(open(‘Config.conf‘?‘w‘?encoding=‘utf-8‘))

????for?i?in?range(currentpage?maxpage):
????????page_num=15
????????page_str_num=i*page_num
????????page_url=index_url+str(page_str_num)
????????print(page_url)
????????attempts?=?0
????????success?=?False
????????while?attempts?????????????try:
????????????????spider_search_page.get_paper_url(page_url)
????????????????socket.setdefaulttimeout(10)??#?設置10秒后連接超時
????????????????success?=?True
????????????except?socket.error:
????????????????attempts?+=?1
????????????????print(“第“+str(attempts)+“次重試!!“)
????????????????if?attempts?==?50:
????????????????????break
????????????except?urllib.error:
????????????????attempts?+=?1
????????????????print(“第“+str(attempts)+“次重試!!“)
????????????????if?attempts?==?50:
????????????????????break
????????cf.set(‘base‘?‘currentpage‘?str(i))
????????cf.write(open(“Config.conf“?“w“?encoding=‘utf-8‘))
????spider_paper.spider_paper()#?spider_paper補全文章信息
????end?=?time.clock()
????print?(‘Running?time:?%s?Seconds‘%(end-start))

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

????.......??????1045??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\.gitignore

????.......???????167??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\Config.conf

????.......????174592??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\data_out.xls

????.......?????11357??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\LICENSE

????.......???????613??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\README.md

????.......?????66113??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\result.png

????.......??????2664??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\spider_main.py

????.......??????5919??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\spider_paper.py

????.......??????1317??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master\spider_search_page.py

?????目錄??????????0??2017-06-15?15:40??CNKICrawler-master\CNKICrawler-master

?????目錄??????????0??2020-11-19?08:34??CNKICrawler-master

-----------?---------??----------?-----??----

???????????????263787????????????????????11


評論

共有 條評論