資源簡介
NULL
博文鏈接:https://luchi007.iteye.com/blog/2260128
代碼片段和文件信息
#?-*-?coding:?cp936?-*-
“““
基于支持向量機的郵件分類系統
使用01標記詞的出現與否
“““
import?re
from?math?import?*
from?SVMKernel?import?*
#切割文本統計詞頻
def?splitText(bigString):
???
????wordlist={}
????rtnList=[]
????wordFreqList={}
????#分詞
????listofTokens=re.split(r‘\W*‘bigString)
????length=len(listofTokens)
????for?token?in?listofTokens:
????????if??not?wordlist.has_key(token):
????????????wordlist[token]=1
????????????rtnList.append(token)
????????else:
????????????wordlist[token]+=1
????????wordFreqList[token]=float(wordlist[token])/length
????return?rtnListwordFreqList
???????
#統計單詞反文檔頻率
def?docFre(word):
????fre=0
????for?i?in?range(126):
????????if?word?in?re.split(r‘\W*‘open(‘spam/%d.txt‘?%?i).read()):
?????????????fre+=1
???
????return?float(fre)/25
????
#特征詞提取,這
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件????????148??2010-10-23?17:11??email\ham\1.txt
?????文件?????????86??2010-10-23?17:13??email\ham\10.txt
?????文件????????130??2010-10-23?17:13??email\ham\11.txt
?????文件????????182??2010-10-23?09:16??email\ham\12.txt
?????文件????????174??2010-10-23?17:13??email\ham\13.txt
?????文件????????172??2010-10-23?17:13??email\ham\14.txt
?????文件????????531??2010-10-23?09:21??email\ham\15.txt
?????文件?????????90??2010-10-23?09:21??email\ham\16.txt
?????文件????????464??2010-10-23?09:22??email\ham\17.txt
?????文件????????175??2010-10-23?09:23??email\ham\18.txt
?????文件????????161??2010-10-23?17:14??email\ham\19.txt
?????文件????????234??2010-10-23?08:48??email\ham\2.txt
?????文件????????208??2010-10-23?09:26??email\ham\20.txt
?????文件????????234??2010-10-23?09:27??email\ham\21.txt
?????文件????????330??2010-10-23?09:28??email\ham\22.txt
?????文件????????608??2010-10-23?17:15??email\ham\23.txt
?????文件?????????42??2010-10-23?09:33??email\ham\24.txt
?????文件?????????89??2010-10-23?09:34??email\ham\25.txt
?????文件????????371??2010-10-23?08:49??email\ham\3.txt
?????文件????????207??2010-10-23?08:50??email\ham\4.txt
?????文件????????114??2010-10-23?17:11??email\ham\5.txt
?????文件???????1464??2010-10-23?17:12??email\ham\6.txt
?????文件????????109??2010-10-23?17:12??email\ham\7.txt
?????文件????????638??2010-10-23?08:58??email\ham\8.txt
?????文件????????146??2010-10-23?09:01??email\ham\9.txt
?????文件????????238??2010-10-23?08:28??email\spam\1.txt
?????文件????????217??2010-10-23?08:36??email\spam\10.txt
?????文件????????414??2010-10-23?08:37??email\spam\11.txt
?????文件????????188??2010-10-23?08:37??email\spam\12.txt
?????文件????????252??2010-10-23?08:38??email\spam\13.txt
............此處省略30個文件信息
評論
共有 條評論