資源簡介
Python3爬蟲入門,爬取豆瓣電影Top250的排名、中文名稱、豆瓣評分、時間、地區等內容,需要用到requests、bs4
代碼片段和文件信息
import?requests
import?re
import?os
from?bs4?import?BeautifulSoup
def?download(url?page):
????html?=?requests.get(url).text???#?這里不加text返回
????soup?=?BeautifulSoup(html?‘lxml‘)
????lis?=?soup.select(“ol?li“)
????for?li?in?lis:
????????index?=?li.select_one(“em“).text
????????title?=?li.select_one(“.hd?.title“).text
????????rating?=?li.select_one(“.bd?.star?.rating_num“).text
????????strInfo?=?re.search(“(?<=
).*?(?=<)“?str(li.select_one(“.bd?p“))?re.S?|?re.M).group().strip()
????????infos?=?strInfo.split(‘/‘)
????????year?=?infos[0].strip()
????????area?=?infos[1].strip()
????????type
評論
共有 條評論