資源簡介
python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲python新浪微博爬蟲

代碼片段和文件信息
#?coding=utf-8
“““??
Created?on?2016-04-24?@author:?Eastmount
功能:?爬取新浪微博用戶的信息及微博評論
網址:http://weibo.cn/?數據量更小?相對http://weibo.com/
“““????
import?time????????????
import?re????????????
import?os????
import?sys??
import?codecs??
import?shutil
import?urllib?
from?selenium?import?webdriver????????
from?selenium.webdriver.common.keys?import?Keys????????
import?selenium.webdriver.support.ui?as?ui????????
from?selenium.webdriver.common.action_chains?import?ActionChains
#先調用無界面瀏覽器PhantomJS或Firefox????
#driver?=?webdriver.PhantomJS(executable_path=“G:\phantomjs-1.9.1-windows\phantomjs.exe“)????
driver?=?webdriver.Firefox()
wait?=?ui.WebDriverWait(driver10)
#全局變量?文件操作讀寫信息
inforead?=?codecs.open(“SinaWeibo_List_best_1.txt“?‘r‘?‘utf-8‘)
infofile?=?codecs.open(“SinaWeibo_Info_best_1.txt“?‘a‘?‘utf-8‘)
#********************************************************************************
#????????????????????????????第一步:?登陸weibo.cn?
#????????該方法針對weibo.cn有效(明文形式傳輸數據)?weibo.com見學弟設置POST和Header方法
#????????????????LoginWeibo(username?password)?參數用戶名?密碼
#********************************************************************************
def?LoginWeibo(username?password):
????try:
????????#輸入用戶名/密碼登錄
????????print?u‘準備登陸Weibo.cn網站...‘
????????driver.get(“http://login.sina.com.cn/“)
????????elem_user?=?driver.find_element_by_name(“username“)
????????elem_user.send_keys(username)?#用戶名
????????elem_pwd?=?driver.find_element_by_name(“password“)
????????elem_pwd.send_keys(password)??#密碼
????????#elem_rem?=?driver.find_element_by_name(“safe_login“)
????????#elem_rem.click()?????????????#安全登錄
????????#重點:?暫停時間輸入驗證碼(http://login.weibo.cn/login/?手機端需要)
????????time.sleep(20)
????????
????????#elem_sub?=?driver.find_element_by_xpath(“//input[@class=‘smb_btn‘]“)
????????#elem_sub.click()??????????????#點擊登陸?因無name屬性
????????elem_pwd.send_keys(Keys.RETURN)
????????time.sleep(2)
????????
????????#獲取Coockie?推薦資料:http://www.cnblogs.com/fnng/p/3269450.html
????????print?driver.current_url
????????print?driver.get_cookies()??#獲得cookie信息?dict存儲
????????print?u‘輸出Cookie鍵值對信息:‘
????????for?cookie?in?driver.get_cookies():?
????????????#print?cookie
????????????for?key?in?cookie:
????????????????print?key?cookie[key]
????????????????????
????????#driver.get_cookies()類型list?僅包含一個元素cookie類型dict
????????print?u‘登陸成功...‘
????????
????????
????except?Exceptione:??????
????????print?“Error:?“e
????finally:????
????????print?u‘End?LoginWeibo!\n\n‘
#********************************************************************************
#??????????????????第二步:?訪問個人頁面http://weibo.cn/5824697471并獲取信息
#????????????????????????????????VisitPersonPage()
#????????編碼常見錯誤?UnicodeEncodeError:?‘ascii‘?codec?can‘t?encode?characters?
#********************************************************************************
def?VisitPersonPage(user_id):
????try:
????????global?infofile???????#全局文件變量
????????url?=?“http:/
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件??????13386??2016-04-24?20:55??[源碼]?爬取移動端微博信息?(強推)\2016-04-23\20160423_SinaWeibo_Num_Best.txt
?????文件???????1595??2016-04-24?20:55??[源碼]?爬取移動端微博信息?(強推)\2016-04-23\Megry_Result_Best.py
?????文件?????237289??2016-04-24?20:52??[源碼]?爬取移動端微博信息?(強推)\2016-04-23\SinaWeibo_Info_best_1.txt
?????文件????????189??2016-04-24?20:46??[源碼]?爬取移動端微博信息?(強推)\2016-04-23\SinaWeibo_List_best_1.txt
?????文件??????12115??2016-04-24?20:54??[源碼]?爬取移動端微博信息?(強推)\2016-04-23\spider_selenium_sina_content.py
?????文件????????840??2016-04-24?21:02??運行配置過程.txt
?????文件???????5628??2016-04-24?20:31??[源碼]?爬取客戶端微博信息\SinaWeibo_Info_best_1.txt
?????文件?????????27??2016-04-24?03:45??[源碼]?爬取客戶端微博信息\SinaWeibo_List_best_1.txt
?????文件???????8119??2016-04-24?20:31??[源碼]?爬取客戶端微博信息\weibo_spider2.py
?????文件??????17680??2016-04-24?21:18??[源碼]?爬取移動端個人信息?關注id和粉絲id?(速度慢)\SinaWeibo_Info_1.txt
?????文件?????????50??2016-04-24?21:17??[源碼]?爬取移動端個人信息?關注id和粉絲id?(速度慢)\SinaWeibo_List_1.txt
?????文件??????14884??2016-04-24?21:19??[源碼]?爬取移動端個人信息?關注id和粉絲id?(速度慢)\spider_selenium_sina_info_other_userid_all.py
?????目錄??????????0??2016-04-24?20:55??[源碼]?爬取移動端微博信息?(強推)\2016-04-23
?????目錄??????????0??2016-04-24?20:46??[源碼]?爬取移動端微博信息?(強推)
?????目錄??????????0??2016-04-24?20:42??[源碼]?爬取客戶端微博信息
?????目錄??????????0??2016-04-24?21:18??[源碼]?爬取移動端個人信息?關注id和粉絲id?(速度慢)
-----------?---------??----------?-----??----
???????????????311802????????????????????16
評論
共有 條評論