91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 2.25MB
    文件類型: .rar
    金幣: 2
    下載: 1 次
    發布日期: 2023-09-09
  • 語言: Python
  • 標簽: HMM??

資源簡介

自然語言處理課程的小作業,以新聞語料為基礎,用HMM算法實現中文分詞。按照每個字為B,E,S,M進行劃分。以新聞語料為基礎,用HMM算法實現中文分詞。按照每個字為B,E,S,M進行劃分。

資源截圖

代碼片段和文件信息



import?numpy?as?np
import?codecs
import?re
import?os



def?write_file(all_wordAndffile_path):
#?????if?not?os.exist(file_path):
????????
????file=codecs.open(file_path?“a“?encoding=“utf-8“)
????all_word=[word?for?word?in?all_wordAndf]
????fluency=[all_wordAndf[word]?for?word??in?all_word]
????arg=np.argsort(fluency)
????print(arg[0])
????print(fluency[arg[0]])
????length=len(arg)-1
????for?i?in?range(len(arg)):
????????file.write(all_word[arg[length-i]]+str(fluency[arg[length-i]]))
????????file.write(“\r“)



def?getN_gram(filepath?n):
????
????result?=?{}
????file=?codecs.open(filepath?“r“?encoding=?“utf-8“)?

????c?=?“|“
#?????c作為間隔符
????for?line?in?file.readlines():
????????p=False
????????count=0
????????sentence=[]
????????word=““
????????for?char?in?line:
????????????
????????????if?char?is?‘/‘?:
#?????????????????print(char)
????????????????count+=1
????????????????if?not?p:
????????????????????p=True
????????????????????word=“#“
????????????????sentence.append(word)
????????????????word?=?““
????????????else:
????????????????if?not?(char>=‘a‘?and?char?<=‘z‘)?and?char?is?not?‘?‘??:
????????????????????word+=char
#?????????print(sentence)
????????for?i?in?np.arange(0?len(sentence)-n):
????????????n_word=““
????????????for?j?in?np.arange(i?i+n):
????????????????n_word+=?sentence[j]+c
????????????n_word?=?n_word.strip()
????????????if?n_word?in?result:
????????????????result[n_word]+=1
????????????else:
????????????????result[n_word]=1
????write_file(result?str(n)+“gram.txt“)
????return?result
????????



all_wordAndf=getN_gram(“1998-01-2003版-帶音.txt“?2)




all_wordAndf_s=getN_gram(“1998-01-2003版-帶音.txt“?1)




def?loadWordFluency_single(filepath):
????result?=?{}
????file?=?codecs.open(filepath?“r“?encoding?=“utf-8“)
????i?=0?
????for?line?in?file.readlines():
????????i?+=?1
????????line?=?line.strip().split(‘|‘)
????????result[line[0]]?=?int(line[-1])
????return?result



def?loadWordFluency_double(filepath):
????result?=?{}
????file?=?codecs.open(filepath?“r“?encoding?=“utf-8“)
????i?=0?
????for?line?in?file.readlines():
????????i?+=?1
????????line?=?line.strip().split(‘|‘)
????????if?len(line)!=3:
????????????print(line)
????????result[line[0]]?=?int(line[-1])
????????result[line[0]?+?‘|‘?+?line[1]]?=?int(line[-1])
????return?result



gram_1?=?loadWordFluency_single(“1gram.txt“)


gram_2?=?loadWordFluency_double(“2gram.txt“)


?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件???????2417??2017-10-27?13:09??Untitled4.py

?????文件???????6100??2017-10-27?13:08??Untitled5.py

?????文件???11276940??2017-09-28?18:55??1998-01-2003版-帶音.txt

-----------?---------??----------?-----??----

?????????????11285457????????????????????3


評論

共有 條評論