資源簡介
python機器學習中文情感分析代碼(語料庫 特征庫 停用詞 源碼),基于酒店評論語料庫機器學習
代碼片段和文件信息
#!/usr/bin/env?python
#?coding:?utf-8
import?pandas?as?pd
import?numpy?as?np
import?sklearn
from?pandas?import?Dataframe?Series
df?=?pd.read_csv(‘data.csv‘?encoding=‘gb18030‘)
#?In[8]:
df.head()
#?print(df.head())
#?In[?]:
df.shape
print(df.shape)
#?In[?]:
def?make_label(df):
????df[“sentiment“]?=?df[“star“].apply(lambda?x:?1?if?x>3?else?0)
#?In[?]:
make_label(df)
#?In[?]:
df.head()
print(df.head())
#?In[?]:
X?=?df[[‘comment‘]]
y?=?df.sentiment
#?In[?]:
X.shape
#?In[?]:
y.shape
#?In[?]:
X.head()
#?In[?]:
import?jieba
#?In[?]:
def?chinese_word_cut(mytext):
????return?“?“.join(jieba.cut(mytext))
#?In[?]:
X[‘cutted_comment‘]?=?X.comment.apply(chinese_word_cut)
#?In[?]:
X.cutted_comment[:5]
#?In[?]:
from?sklearn.model_selection?import?train_
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件???????9893??2020-06-02?23:15??demo\.ipynb_checkpoints\demo-checkpoint.ipynb
?????文件???????3607??2020-06-02?23:55??demo\.ipynb_checkpoints\demo-checkpoint.py
?????文件?????364905??2020-06-02?23:15??demo\data.csv
?????文件??????48129??2020-06-04?22:21??demo\demo.ipynb
?????文件???????3607??2020-06-02?23:55??demo\demo.py
?????文件???????1411??2020-06-02?23:15??demo\environment.yaml
?????文件???????1119??2020-06-04?23:11??demo\sentiment.marshal.3
?????文件????????157??2020-06-02?23:15??demo\stopwordsHIT.txt
?????目錄??????????0??2020-06-03?21:21??demo\.ipynb_checkpoints
?????目錄??????????0??2020-06-04?23:10??demo
-----------?---------??----------?-----??----
???????????????432828????????????????????10
評論
共有 條評論