91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 5KB
    文件類型: .py
    金幣: 1
    下載: 0 次
    發布日期: 2021-05-25
  • 語言: Python
  • 標簽: Python??爬蟲??

資源簡介

Python爬蟲的代碼示例,包括表單提交、爬取子網頁等等

資源截圖

代碼片段和文件信息

#?-*-?coding:?utf-8?-*-
“““
@author:?Administrator
“““

import?urllib
import?requests
import?re
import?pandas?as?pd?
import?numpy?as?np
import?ssl
import?warnings
warnings.filterwarnings(“ignore“)

base_url?=?‘https://umbraco.tv‘





????
if?__name__==‘__main__‘:????
????#第一步:完成登錄
????url?=?‘https://umbraco.tv/login/‘
????resp?=?requests.get(url?verify=False?allow_redirects=False)
????headers?=?{
???? ‘Content-type‘:‘application/x-www-form-urlencoded‘
??????????‘Cookie‘:resp.headers[‘set-cookie‘]
???? }
????formdata=resp.content
????#從formdata中提取兩個變量的值__RequestVerificationTokenufprt并把值放到postdata里
????reqToken?=?re.findall(r‘‘formdatare.S|re.M)[0]
????ufprt?=?re.findall(r‘‘formdatare.S|re.M)[0]
????
????data?=?{?‘__RequestVerificationToken‘:reqToken
?????????????‘Username‘:‘haierol@qq.com‘
?????????????‘Password‘:‘EEjnMYL3‘
?????????????‘ReturnUrl‘:‘‘
?????????????‘ufprt‘:ufprt?????????????
?????????????}
????
????#這里才是真正的登陸過程,data里面是抓包獲取的賬號密碼及其他信息。
????resp?=?requests.post(url?urllib.urlencode(data)
???? headers=headers
???? verify=False
???? allow_redirects=False)??
????
????
????headers?=?{‘Cookie‘:resp.headers[‘set-cookie‘]}
????
????url1=‘https://umbraco.tv/videos/umbraco-v7/developer/fundamentals/api-controllers/introduction/‘
????
????resp1?=?requests.get(url1headers=headersverify=Falseallow_redirects=False)
????t=resp1.content?#這時應該可以取到這個需要登錄的頁面里的mp4信息了?
????
????
????#第二步:爬所有鏈接的子網頁????
????mp4_list?=?[]????
????access_list?=?[]
????
????#爬5層
????index_url?=?‘https://umbraco.tv/videos/‘????????
????i_linklist?=?[re.findall(r‘?href=“(.{1100}?)“‘urllib.urlopen(index_url).read()re.S|re.M)]
????i?=?0
????
????all_links?=?[]
????
????while?i<10:
????????l?=?i_linklist[i]????#第i層所有link
????????new_l?=?[]
????????for?link?in?l:?
????????????if?link.startswith(‘href=‘):
????????????????link?=?link[6:-1]
????????????#過濾.png?.css?...
????????????if?not?((‘http://‘?in?link)?or?(‘https://‘?in?link)?or?(‘.png‘?in?link)?or?(‘.css‘?in?link)?or?(‘.js‘?in?link)?or?(‘.ico‘??in?link)):
????????????????print?(link)????????????????
????????????????if?base_url?+?link?

評論

共有 條評論