資源簡(jiǎn)介
智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python),直接運(yùn)行可用,需要自己下載依賴的包,比如scrapy,pandas,matplotlib等...可用根據(jù)百度和錯(cuò)誤提示一步一步安裝依賴包

代碼片段和文件信息
#-*-?coding:?utf-8?-*-
import?re
import?csv
import?jieba
import?numpy
import?requests
from?tqdm?import?tqdm
import?pandas?as?pd
from?scipy.misc?import?imread
from?wordcloud?import?WordCloud?ImageColorGenerator
from?collections?import?Counter
from?bs4?import?BeautifulSoup
import?matplotlib.pyplot?as?plt
from?requests.exceptions?import?RequestException
def?get_one_page(city?keyword?region?page):
????‘‘‘
????獲取網(wǎng)頁(yè)html內(nèi)容并返回
????‘‘‘
????paras?=?{
????????‘jl‘:?city?????????#?搜索城市
????????‘kw‘:?keyword??????#?搜索關(guān)鍵詞?
????????‘isadv‘:?0?????????#?是否打開(kāi)更詳細(xì)搜索選項(xiàng)
????????‘isfilter‘:?1??????#?是否對(duì)結(jié)果過(guò)濾
????????‘sg‘:?‘d5259c62115f44e3bbb380dc88411919‘
????????‘p‘:?page??????????#?頁(yè)數(shù)
????????‘re‘:?region????????#?region的縮寫,地區(qū),2005代表海淀
????}
????headers?=?{
????????‘User-Agent‘:?‘Mozilla/5.0?(Windows?NT?10.0;?WOW64)?AppleWebKit/537.36?(KHTML?like?Gecko)?Chrome/63.0.3239.132?Safari/537.36‘
????????‘Host‘:?‘sou.zhaopin.com‘
????????‘Referer‘:?‘https://www.zhaopin.com/‘
????????‘Accept‘:?‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9image/webpimage/apng*/*;q=0.8‘
????????‘Accept-Encoding‘:?‘gzip?deflate?br‘
????????‘Accept-Language‘:?‘zh-CNzh;q=0.9‘
????}
????url?=?‘https://sou.zhaopin.com/jobs/searchresult.ashx?‘
????try:
????????#?獲取網(wǎng)頁(yè)內(nèi)容,返回html數(shù)據(jù)
????????response?=?requests.get(url?params=paras?headers=headers)
????????print(response.url)
????????#?通過(guò)狀態(tài)碼判斷是否獲取成功
????????if?response.status_code?==?200:
????????????return?response.text
????????return?None
????except?RequestException?as?e:
????????return?None
def?parse_one_page(html):
????‘‘‘
????解析HTML代碼,提取有用信息并返回
????‘‘‘
????#?正則表達(dá)式進(jìn)行解析
????pattern?=?re.compile(‘(.*?).*?‘?#?匹配職位詳情地址和職位名稱
????????‘ .*??target=“_blank“>(.*?).*?‘?????????????????????????????#?匹配公司名稱
????????‘ (.*?) ‘?re.S)????????????????????????????????????????????#?匹配月薪??????
????#?匹配所有符合條件的內(nèi)容
????items?=?re.findall(pattern?html)???
????for?item?in?items:
????????job_name?=?item[1]
????????job_name?=?job_name.replace(‘‘?‘‘)
????????job_name?=?job_name.replace(‘‘?‘‘)
????????salary_avarage?=?0
????????temp?=?item[3]
????????if?temp?!=?‘面議‘:
????????????idx?=?temp.find(‘-‘)
????????????#?求平均工資
????????????salary_avarage?=?(int(temp[0:idx])?+?int(temp[idx+1:]))//2
????????#?html?=?get_detail_page(job_url)
????????#?print(html)
????????yield?{
????????????‘job‘:?job_name
????????????‘job_url‘:?item[0]
????????????‘company‘:?item[2]
????????????‘salary‘:?salary_avarage
????????}
def?get_detail_page(url):
????‘‘‘
????獲取職位詳情頁(yè)html內(nèi)容并返回
????‘‘‘
????headers?=?{
????????‘User-Agent‘:?‘Mozilla/5.0?(Windows?NT?10.0;?WOW64)?AppleWebKit/537.36?(KHTML?like?Gecko)?Chrome/63.0.3239.132?Safari/537.36‘
????????‘Host‘:?‘jobs.zhaopin.com‘
????????‘Accept‘:?‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9image/webpimage/apng*/*;q=
?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----
?????文件?????134783??2018-07-10?15:19??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\2.png
?????文件??????93404??2018-07-10?17:10??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\output.png
?????文件???????8295??2018-07-10?15:40??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\stopwords.txt
?????文件??????10665??2018-07-10?17:38??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\zhilian.py
?????文件??????72208??2018-07-10?17:08??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\zl_上海_java工程師.csv
?????文件?????289072??2018-07-10?17:08??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)\zl_上海_java工程師.txt
?????目錄??????????0??2018-07-10?19:47??智聯(lián)招聘爬取工作崗位薪資分布以及崗位要求(python)
-----------?---------??----------?-----??----
???????????????608427????????????????????7
評(píng)論
共有 條評(píng)論