91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 566KB
    文件類型: .rar
    金幣: 2
    下載: 2 次
    發布日期: 2021-06-09
  • 語言: Python
  • 標簽: python3??網絡爬蟲??

資源簡介

花了幾天寫的,是我需要寫一個掃描器,爬網址是其中一個功能,就分享出來了,半天爬10萬個網址,全自動無限爬。里面有一個sql文件,直接導入數據庫就行了。

資源截圖

代碼片段和文件信息

#-*-?coding:?UTF-8?-*-
import?re
from?urllib?import?request
from?urllib?import?parse
import?pymysql
import?threading

#創建數據庫連接??買源碼到96KaiFa,網址:www.96kaifa.com????下面是數據庫,自己配置下吧!
conn=pymysql.connect(host=‘localhost‘port=3306user=‘root‘passwd=‘root‘db=‘scan‘)
#創建數據庫游標
cursor=conn.cursor()
#執行sql,并返回收影響行數
effect_row=cursor.execute(“select?*?from?url?where?status=0?order?by?id?desc“)

def?kaishi():
#======獲取網頁源代碼方法?start======#
def?getHtml(url):
try:
page?=?request.urlopen(url?timeout=3)
html?=?page.read()
return?html
except:
print(“異常“)
code_error?=?““


counts=0
for?url?in?cursor.fetchall():
effect_row?=?cursor.execute(“select?*?from?url“)
print(effect_row)
url?=?url[1]
try:
effect_row?=?cursor.execute(“select?*?from?url?where?url=‘%s‘?and?status=1“%(url))
print(str(counts)?+?“:“?+?str(url))
except:
continue
if?effect_row==1:
code_error?=?“yichang“
print(“已爬過“)
else:
print(“開始爬“)
cursor.execute(“update?url?set?status=1?where?url=‘%s‘“%(url))
try:
#print(“開始:“+url)
html?=?getHtml(url)
#print(“結束:“?+?url)
except:
code_error=“yichang“
print(“異常“)
counts+=1
#print(html)
#======獲取網頁源代碼方法?start======#
if?code_error==“yichang“:
print(“跳過“)
continue
else:
#獲取中的URL
#print(“開始提取網址“)
res_url?=?r“(?<=href=\“).+?(?=\“)|(?<=href=\‘).+?(?=\‘)“
link?=?re.findall(res_urlstr(html)re.I|re.S|re.M)
for?urls?in?link:
f?=?parse.urlparse(urls)
if?(f.scheme==‘‘?or?f.netloc==‘‘):?#http為空?或者?host?為空時,跳過
continue
n_url?=?f.scheme?+?“://“?+?f.netloc
effect_row?=?cursor.execute(“select?*?from?url?where?url=‘%s‘“%(n_url))
if?effect_row==1:
#print(“已存在“)
continue
else:
n_url?=?f.scheme?+?“://“?+?f.netloc
print(“把?“+n_url+“?寫入數據庫“)
cursor.execute(“insert?into?url?(urlstatussvn_status)?values(‘%s‘00)“%(n_url))

t1=threading.Thread(target=kaishi)
t1.start()
t2=threading.Thread(target=kaishi)
t2.start()
t3=threading.Thread(target=kaishi)
t3.start()
t4=threading.Thread(target=kaishi)
t4.start()
t5=threading.Thread(target=kaishi)
t5.start()

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

????.CA....??????2433??2018-02-10?17:16??reptile.py

????.CA....???3520820??2018-02-10?17:17??scan.sql

-----------?---------??----------?-----??----

??????????????3523253????????????????????2


評論

共有 條評論