資源簡介
包含20幾個行業(yè)的文本語料,可用于文本分析,如文本相似度計算,文本挖掘,情感分析、關鍵詞云圖的構(gòu)建等
代碼片段和文件信息
#?-*-?coding:?UTF-8?-*-?
f?=?open(‘30wChinsesSeqDic.txt‘)
fout?=?open(‘30wdict.txt‘‘a(chǎn)‘)
count?=?0
for?line?in?f:
temp?=?line.strip()
temp_list?=?temp.split(‘?‘)
temp_sublist?=?temp_list[1].split(‘\t‘)
if?len(temp_sublist[1])?>?2:
count?=?count?+?1
print?temp_sublist[1]
fout.write(temp_sublist[1]?+?‘\n‘)
f.close()
fout.close()
#print?count
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2020-06-13?06:34??funNLP-master\
?????目錄???????????0??2020-06-13?06:34??funNLP-master\.github\
?????文件?????????801??2020-06-13?06:34??funNLP-master\.github\FUNDING.yml
?????文件???????81327??2020-06-13?06:34??funNLP-master\README.md
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\.logo圖片\
?????文件???????52918??2020-06-13?06:34??funNLP-master\data\.logo圖片\.img.jpg
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\.logo圖片\.捐贈圖片\
?????文件??????134177??2020-06-13?06:34??funNLP-master\data\.logo圖片\.捐贈圖片\.alipay.jpg
?????文件??????103106??2020-06-13?06:34??funNLP-master\data\.logo圖片\.捐贈圖片\.wechat.jpg
?????文件?????????419??2020-06-13?06:34??funNLP-master\data\.logo圖片\.捐贈圖片\donation.md
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\IT詞庫\
?????文件??????308187??2020-06-13?06:34??funNLP-master\data\IT詞庫\THUOCL_it.txt
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\NLP_BOOK\
?????文件?????3359237??2020-06-13?06:34??funNLP-master\data\NLP_BOOK\eisenstein-nlp-notes.pdf
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\
?????文件????????6148??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\.DS_Store
?????文件?????7527940??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\30wChinsesSeqDic.txt
?????文件?????3989784??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\30wChinsesSeqDic_clean.txt
?????文件?????3186208??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\30wdict.txt
?????文件?????3186211??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\30wdict_utf8.txt
?????文件??????848536??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\42537條偽原創(chuàng)詞庫.txt
?????目錄???????????0??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\QQ拼音詞庫\
?????文件????????7056??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\QQ拼音詞庫\QQpinyin.jpg
?????文件?????????178??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\QQ拼音詞庫\QQ拼音詞庫導出.txt
?????文件?????2355763??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\dict.txt
?????文件??????565268??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\fingerDic.txt
?????文件?????2326382??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\httpcws_dict.txt
?????文件?????1656360??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\out.txt
?????文件?????????365??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\thirtyw.py
?????文件?????????513??2020-06-13?06:34??funNLP-master\data\中文分詞詞庫整理\thirtyw.pyc
............此處省略116個文件信息
- 上一篇:貝葉斯濾波與平滑
- 下一篇:PCL 1.8.0 源碼
評論
共有 條評論