91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 7.15MB
    文件類型: .zip
    金幣: 2
    下載: 0 次
    發(fā)布日期: 2023-09-23
  • 語言: Python
  • 標(biāo)簽: python??jieba??

資源簡介

因為比賽需要用到結(jié)巴分詞,所以寫了一個關(guān)于結(jié)巴分詞、詞性標(biāo)注以及停用詞過濾的python程序。

資源截圖

代碼片段和文件信息

#?-*-?coding:?utf-8?-*-
“““
Created?on?Mon?Oct?31?15:39:16?2016

@author:?lcy
“““
import?jieba
import?jieba.posseg?as?pseg#用于詞性標(biāo)注
#分詞
def?part_word(fid1fid3):
????for?i?in?fid1.readlines():
????????data_line=i.strip()
????????wordList?=?jieba.cut(data_line.decode(“gbk“))#wordlist是一個生成器
????????outStr?=?‘‘
????????for?word?in?wordList:
????????????outStr?+=?word
????????????outStr?+=?‘?‘????
????????fid3.write(outStr.strip().encode(‘gbk‘)?+?‘\n‘)?????
????print(type(wordList))
#詞性標(biāo)注
def?ci_xing(fid1fid3):
????for?i?in?fid1.readlines():
????????data_line=i.strip()
????????words=?pseg.cut(data_line.decode(“gbk“))#wordlist是一個生成器
????????outStr?=?‘‘
????????for?w?in?words:
????????????outStr?+=?w.word
????????????outStr?+=?‘/‘
????????????outStr?+=?w.flag
????????????outStr?+=?‘?‘????
????????fid3.write(outStr.strip().encode(‘gbk‘)?+?‘\n‘)
#停用詞過濾
def?stop_word(fid1fid2fid3):
????stopword=[]
????for?j?in?fid2.readlines():
????????stopword.append(j.strip().decode(“gbk“))#儲存停用詞表
????????#print?j
????for?i?in?fid1.readlines():
????????data_line=i.strip()
????????wordList?=?jieba.cut(data_line.decode(“gbk“))#wordlist是一個生成器
????????outStr=‘‘
????????for?word?in?wordList:
????????????if?word?not?in?stopword:
????????????????outStr+=word
????????????????outStr+=‘?‘
????????fid3.write(outStr.strip().encode(‘gbk‘)?+?‘\n‘)
????
#主文件
def?main():
????fid1=open(‘pos.txt‘‘r‘)#讀取文件
????fid2=open(‘stopword.txt‘‘r‘)#讀取停用詞表
????fid3=open(‘poss.txt‘‘w‘)#將要寫入的文件
????#stop_word(fid1fid2fid3)#停用詞過濾
????part_word(fid1fid2)#分詞
????#ci_xing(fid1fid2)#詞性標(biāo)注
????fid1.close()
????fid2.close()???
????fid3.close()
main()


????

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2016-10-31?22:18??test\
?????目錄???????????0??2016-10-31?18:50??test\jieba\
?????目錄???????????0??2016-10-31?18:50??test\jieba\analyse\
?????文件????????1423??2015-02-11?16:28??test\jieba\analyse\analyzer.py
?????文件????????2183??2015-02-11?16:43??test\jieba\analyse\analyzer.pyc
?????文件?????6471088??2013-12-05?13:24??test\jieba\analyse\idf.txt
?????文件????????3490??2015-02-17?18:48??test\jieba\analyse\textrank.py
?????文件????????3943??2015-03-20?11:02??test\jieba\analyse\textrank.pyc
?????文件????????3492??2015-02-17?18:48??test\jieba\analyse\__init__.py
?????文件????????4425??2015-03-20?11:02??test\jieba\analyse\__init__.pyc
?????文件?????5420898??2015-02-11?16:27??test\jieba\dict.txt
?????目錄???????????0??2016-10-31?18:50??test\jieba\finalseg\
?????文件??????598842??2014-11-15?13:36??test\jieba\finalseg\prob_emit.p
?????文件?????1356958??2015-02-11?16:28??test\jieba\finalseg\prob_emit.py
?????文件??????513079??2015-02-14?21:06??test\jieba\finalseg\prob_emit.pyc
?????文件??????????62??2014-11-15?13:36??test\jieba\finalseg\prob_start.p
?????文件??????????97??2014-11-15?13:36??test\jieba\finalseg\prob_start.py
?????文件?????????215??2015-02-14?21:06??test\jieba\finalseg\prob_start.pyc
?????文件?????????146??2014-11-15?13:36??test\jieba\finalseg\prob_trans.p
?????文件?????????245??2014-11-15?13:36??test\jieba\finalseg\prob_trans.py
?????文件?????????316??2015-02-14?21:06??test\jieba\finalseg\prob_trans.pyc
?????文件????????2816??2015-02-11?16:28??test\jieba\finalseg\__init__.py
?????文件????????3319??2015-02-14?21:06??test\jieba\finalseg\__init__.pyc
?????目錄???????????0??2016-10-31?18:50??test\jieba\posseg\
?????文件?????1078947??2014-11-15?13:36??test\jieba\posseg\char_state_tab.p
?????文件?????1679102??2015-02-11?16:28??test\jieba\posseg\char_state_tab.py
?????文件??????817983??2015-02-11?16:40??test\jieba\posseg\char_state_tab.pyc
?????文件?????1522393??2014-11-15?13:36??test\jieba\posseg\prob_emit.p
?????文件?????4076462??2015-02-11?16:28??test\jieba\posseg\prob_emit.py
?????文件?????1074415??2015-02-11?16:40??test\jieba\posseg\prob_emit.pyc
?????文件????????6321??2014-11-15?13:36??test\jieba\posseg\prob_start.p
............此處省略19個文件信息

評論

共有 條評論