資源簡介
Python ip 代理池爬取工具,Python ip 代理池爬取工具,Python ip 代理池爬取工具

代碼片段和文件信息
#?coding:utf-8
‘‘‘
定義規則?urls:url列表
?????????type:解析方式取值?regular(正則表達式)xpath(xpath解析)module(自定義第三方模塊解析)
?????????patten:可以是正則表達式可以是xpath語句不過要和上面的相對應
‘‘‘
import?os
import?random
‘‘‘
ip,端口,類型(0高匿名,1透明),protocol(0?http1?https)country(國家)area(省市)updatetime(更新時間)
?speed(連接速度)
‘‘‘
parserList?=?[
????{
????????‘urls‘:?[‘http://www.66ip.cn/%s.html‘?%?n?for?n?in?[‘index‘]?+?list(range(2?12))]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//*[@id=‘main‘]/div/div[1]/table/tr[position()>1]“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘./td[4]‘?‘protocol‘:?‘‘}
????}
????{
????????‘urls‘:?[‘http://www.66ip.cn/areaindex_%s/%s.html‘?%?(m?n)?for?m?in?range(1?35)?for?n?in?range(1?10)]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//*[@id=‘footer‘]/div/table/tr[position()>1]“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘./td[4]‘?‘protocol‘:?‘‘}
????}
????{
????????‘urls‘:?[‘http://cn-proxy.com/‘?‘http://cn-proxy.com/archives/218‘]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//table[@class=‘sortable‘]/tbody/tr“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘‘?‘protocol‘:?‘‘}
????}
????{
????????‘urls‘:?[‘http://www.mimiip.com/gngao/%s‘?%?n?for?n?in?range(1?10)]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//table[@class=‘list‘]/tr“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘‘?‘protocol‘:?‘‘}
????}
????{
????????‘urls‘:?[‘https://proxy-list.org/english/index.php?p=%s‘?%?n?for?n?in?range(1?10)]
????????‘type‘:?‘module‘
????????‘moduleName‘:?‘proxy_listPraser‘
????????‘pattern‘:?‘Proxy\(.+\)‘
????????‘position‘:?{‘ip‘:?0?‘port‘:?-1?‘type‘:?-1?‘protocol‘:?2}
????}
????{
????????‘urls‘:?[‘http://incloak.com/proxy-list/%s#list‘?%?n?for?n?in
?????????????????([‘‘]?+?[‘?start=%s‘?%?(64?*?m)?for?m?in?range(1?10)])]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//table[@class=‘proxy__t‘]/tbody/tr“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘‘?‘protocol‘:?‘‘}
????}
????{
????????‘urls‘:?[‘http://www.kuaidaili.com/proxylist/%s/‘?%?n?for?n?in?range(1?11)]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//*[@id=‘index_free_list‘]/table/tbody/tr[position()>0]“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘./td[3]‘?‘protocol‘:?‘./td[4]‘}
????}
????{
????????‘urls‘:?[‘http://www.kuaidaili.com/free/%s/%s/‘?%?(m?n)?for?m?in?[‘inha‘?‘intr‘?‘outha‘?‘outtr‘]?for?n?in
?????????????????range(1?11)]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//*[@id=‘list‘]/table/tbody/tr[position()>0]“
????????‘position‘:?{‘ip‘:?‘./td[1]‘?‘port‘:?‘./td[2]‘?‘type‘:?‘./td[3]‘?‘protocol‘:?‘./td[4]‘}
????}
????{
????????‘urls‘:?[‘http://www.cz88.net/proxy/%s‘?%?m?for?m?in
?????????????????[‘index.shtml‘]?+?[‘http_%s.shtml‘?%?n?for?n?in?range(2?11)]]
????????‘type‘:?‘xpath‘
????????‘pattern‘:?“.//*[@id=‘boxright‘]/div/ul/li[position()>1]“
????????‘position‘:?{‘ip‘:?‘./div[1]‘?‘port‘:?‘./div[2]‘?‘type‘:?‘./div[3]‘?
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\
?????文件????????1056??2017-06-16?07:30??IPProxyPool-master\.gitignore
?????文件?????????764??2017-06-16?07:30??IPProxyPool-master\IPProxy.py
?????文件???????13034??2017-06-16?07:30??IPProxyPool-master\README.md
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\api\
?????文件??????????22??2017-06-16?07:30??IPProxyPool-master\api\__init__.py
?????文件?????????867??2017-06-16?07:30??IPProxyPool-master\api\apiServer.py
?????文件???????11286??2017-06-16?07:30??IPProxyPool-master\config.py
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\data\
?????文件?????9290764??2017-06-16?07:30??IPProxyPool-master\data\qqwry.dat
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\db\
?????文件????????1456??2017-06-16?07:30??IPProxyPool-master\db\DataStore.py
?????文件?????????546??2017-06-16?07:30??IPProxyPool-master\db\ISqlHelper.py
?????文件????????2431??2017-06-16?07:30??IPProxyPool-master\db\MongoHelper.py
?????文件????????5377??2017-06-16?07:30??IPProxyPool-master\db\RedisHelper.py
?????文件????????5427??2017-06-16?07:30??IPProxyPool-master\db\SqlHelper.py
?????文件??????????22??2017-06-16?07:30??IPProxyPool-master\db\__init__.py
?????文件???????28228??2017-06-16?07:30??IPProxyPool-master\qiye2.jpg
?????文件?????????127??2017-06-16?07:30??IPProxyPool-master\requirements.txt
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\spider\
?????文件????????1429??2017-06-16?07:30??IPProxyPool-master\spider\HtmlDownloader.py
?????文件????????6101??2017-06-16?07:30??IPProxyPool-master\spider\HtmlPraser.py
?????文件????????3669??2017-06-16?07:30??IPProxyPool-master\spider\ProxyCrawl.py
?????文件??????????22??2017-06-16?07:30??IPProxyPool-master\spider\__init__.py
?????文件??????????16??2017-06-16?07:30??IPProxyPool-master\start.bat
?????目錄???????????0??2017-06-16?07:30??IPProxyPool-master\test\
?????文件??????????22??2017-06-16?07:30??IPProxyPool-master\test\__init__.py
?????文件?????????395??2017-06-16?07:30??IPProxyPool-master\test\test.py
?????文件????????4810??2017-06-16?07:30??IPProxyPool-master\test\testIPAddress.py
?????文件????????1528??2017-06-16?07:30??IPProxyPool-master\test\testIPType.py
?????文件?????????288??2017-06-16?07:30??IPProxyPool-master\test\testba
............此處省略14個文件信息
評論
共有 條評論