91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 3KB
    文件類型: .py
    金幣: 1
    下載: 0 次
    發(fā)布日期: 2021-05-10
  • 語(yǔ)言: C/C++
  • 標(biāo)簽:

資源簡(jiǎn)介

和C++版本的思路一樣,可以制定頁(yè)數(shù)進(jìn)行爬取百度搜索結(jié)果頁(yè)面的信息

資源截圖

代碼片段和文件信息

#!/usr/bin
#coding:utf-8
import?sys
import?urllib
import?urllib2
import?re

class?FetchUrl:
????“““This?a?BaiduCrawler?for?get?subUrl?of?PageContent“““
????
????def?__init__(self?strKeyword?iPages?=?1):
????????‘‘‘Some?Inition‘‘‘
????????self.m_strKeyword?=?strKeyword
????????self.m_iPages?=?iPages
????????
????def?GetSubPageUrlList(self?url?comreg):
????????‘‘‘Fetch?subUrl?of?Pages‘‘‘
????????try:
????????????response?=?urllib2.urlopen(url)
????????except?urllib2.HTTPError?e:
????????????print?“******Get?A?HTTPError?Try?again*****“
????????????response?=?urllib2.urlopen(url)
????????except?urllib2.URLError?e:
????????????print?“******Get?An?URLError?Try?again*****“
????????????response?=?urllib2.urlopen(url)
????????htmlpage?=?response.read()
????????infoList1?=?re.findall(comreg?htmlpage)
????????#將列表去重之后返回
????????return?list(set(infoList1))

????def?GetUrlList(self):
????????‘‘‘獲取結(jié)果頁(yè)面中指定頁(yè)數(shù)的子鏈接‘‘‘
????????mainList?=?[];
????????reg?=?r‘http://www.baidu.com/link\?url=.[^\“]+‘
????????comreg?=?re.compile(reg)
????????print?“任務(wù)的關(guān)鍵詞為:%s“?%?self.m_strKeyword
????????#將關(guān)鍵詞進(jìn)行url編碼
????????encodeKeyword?=?urllib.quote(self.m_strKeyword.decode(‘gbk‘).encode(‘utf-8‘))
????????i?=?1
????????while?i?<=?self.m_iPages:
????????????url?=?‘http://www.baidu.com/s?wd=%s&pn=%d&tn=baiduhome_pg&ie=utf-8&usm=4‘?%?(encodeKeyword?i)
????????????subList?=?self

評(píng)論

共有 條評(píng)論

相關(guān)資源