資源簡介
分析三國演義和紅樓夢,進行中文分詞,統(tǒng)計人物出場頻次,生成詞云圖。分享給廣大python愛好者學習交流。

代碼片段和文件信息
#?#e10.3CalThreeKingdoms.py
#?import?jieba
#?excludes?=?{}#{“將軍““卻說““丞相“}
#?txt?=?open(“三國演義.txt“?“r“?encoding=‘utf-8‘).read()
#?words??=?jieba.lcut(txt)
#?counts?=?{}
#?for?word?in?words:
#?????if?len(word)?==?1:??#排除單個字符的分詞結果
#?????????continue
#?????else:
#?????????counts[word]?=?counts.get(word0)?+?1
#?for?word?in?excludes:
#?????del(counts[word])
#?items?=?list(counts.items())
#?items.sort(key=lambda?x:x[1]?reverse=True)?
#?for?i?in?range(15):
#?????word?count?=?items[i]
#?????print?(“{0:<10}{1:>5}“.format(word?count))
#e10.4CalThreeKingdoms.py
import?jieba
excludes?=?{“將軍““卻說““荊州““二人““不可““不能““如此“}
excludes?=?{}
txt?=?open(“三國演義.txt“?“r“?encoding=‘utf-8‘).read()
words??=?jieba.lcut(txt)
counts?=?{}
for?word?in?words:
????if?len(word)?==?1:
????????continue
????elif?word?==?“諸葛亮“?or?word?==?“孔明曰“:
????????rword?=?“孔明“
????elif?word?==?“關公“?or?word?==?“云長“:
????????rword?=?“關羽“
????elif?word?==?“玄德“?or?word?==?“玄德曰“:
????????rword?=?“劉備“
????elif?word?==?“孟德“?or?word?==?“丞相“:
????????rword?=?“曹操“
????else:
????????rword?=?word
????counts[rword]?=?counts.get(rword0)?+?1
for?word?in?excludes:
????del(counts[word])
items?=?list(counts.items())
items.sort(key=lambda?x:x[1]?reverse=True)?
for?i?in?range(20):
????word?count?=?items[i]
????print?(“{0:<10}{1:>5}“.format(word?count))
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2020-10-22?23:00??中文分詞\
?????文件????15043584??2006-09-25?16:46??中文分詞\msyh.ttf
?????文件?????1792627??2020-10-22?21:06??中文分詞\三國演義.txt
?????文件????????1503??2020-10-22?22:33??中文分詞\三國演義分詞.py
?????文件??????193632??2020-10-22?22:53??中文分詞\三國演義詞云.png
?????文件????????1074??2020-10-22?22:05??中文分詞\三國演義詞云.py
?????文件?????2463064??2020-10-22?21:09??中文分詞\紅樓夢.txt
?????文件?????????567??2020-10-22?22:33??中文分詞\紅樓夢分詞.py
?????文件??????298031??2020-10-22?22:53??中文分詞\紅樓夢詞云.png
?????文件????????1070??2020-10-22?22:03??中文分詞\紅樓夢詞云.py
評論
共有 條評論