91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

  • 大小: 25KB
    文件類型: .zip
    金幣: 2
    下載: 0 次
    發布日期: 2021-05-13
  • 語言: Python
  • 標簽:

資源簡介

使用Python3.5實現貝葉斯完成文本分類,代碼中有詳細的解釋

資源截圖

代碼片段和文件信息

#coding=utf-8
from?numpy?import?*
#文本轉化為詞向量
def?loadDataSet():#創建一個實驗樣本
????postingList=[[‘my‘?‘dog‘?‘has‘?‘flea‘?‘problems‘?‘help‘?‘please‘]
?????????????????[‘maybe‘?‘not‘?‘take‘?‘him‘?‘to‘?‘dog‘?‘park‘?‘stupid‘]
?????????????????[‘my‘?‘dalmation‘?‘is‘?‘so‘?‘cute‘?‘I‘?‘love‘?‘him‘]
?????????????????[‘stop‘?‘posting‘?‘stupid‘?‘worthless‘?‘garbage‘]
?????????????????[‘mr‘?‘licks‘?‘ate‘?‘my‘?‘steak‘?‘how‘?‘to‘?‘stop‘?‘him‘]
?????????????????[‘quit‘?‘buying‘?‘worthless‘?‘dog‘?‘food‘?‘stupid‘]]
????classVec?=?[010101]????#1表示侮辱類,0表示不屬于
????return?postingListclassVec?#詞條切分后的分檔和類別標簽

#包含所有文檔?不含重復詞的列表list
def?createVocabList(dataSet):
????vocabSet=set([])#創建空集,set是返回不帶重復詞的list
????for?document?in?dataSet:
????????vocabSet=vocabSet|set(document)?#創建兩個集合的并集
????return?list(vocabSet)#輸出不重復的元素
#判斷某個詞條在文檔中是否出現-詞集模型
def?setOfWords2Vec(vocabList?inputSet):#參數為詞匯表和某個文檔
????returnVec?=?[0]*len(vocabList)#創建一個所含有的元素都為0?的向量
????for?word?in?inputSet:#遍歷文檔中所有的單詞,如果出現詞匯表中的單詞,則將輸出的文檔向量中的對應值設為1
????????if?word?in?vocabList:
????????????returnVec[vocabList.index(word)]?=?1#index函數在字符串里找到字符第一次出現的位置模型
????????else:?print(“the?word:?%s?is?not?in?my?Vocabulary!“?%?word)#返回文檔向量?表示某個詞是否在輸入文檔中出現過?1/0
????return?returnVec?#輸入中的元素在詞匯表時,詞匯表相應位置為1,否則為0

#高級詞袋模型,判斷詞出現次數
def?bagOfWords2VecMN(vocabListinputSet):
????returnVec?=?[0]?*?len(vocabList)
????for?word?in?inputSet:
????????if?word?in?vocabList:
????????????returnVec[vocabList.index(word)]?+=?1#文檔的詞袋模型??每個單詞可以出現多次
????????else:
????????????print(“the?word:?%s?is?not?in?my?Vocabulary!“?%?word)??#?返回文檔向量?表示某個詞是否在輸入文檔中出現過?1/0
????return?returnVec

#樸素貝葉斯分類訓練函數-得到每個特征的條件概率-從詞向量計算概率
def?trainNB0(trainMatrixtrainCategory):#輸入的文檔信息和標簽
#trainMatrix:文檔矩陣
#trainCategory:每篇文檔類別標簽
????numTrainDocs=len(trainMatrix)?#文檔數目
????numWords=len(trainMatrix[0])
????pAbusive=sum(trainCategory)/float(numTrainDocs)?#文檔中屬于侮辱類的概率,等于1才能算,0是非侮辱類
????#p0Num=zeros(numWords);?p1Num=zeros(numWords)
????#p0Denom=0.0;p1Denom=0.0
????p0Num?=?ones(numWords)#避免一個概率值為0,最后的乘積也為0
????p1Num?=?ones(numWords)#避免一個概率值為0,最后的乘積也為0
????p0Denom?=?2.0#分母初始化為2
????p1Denom?=?2.0
????for?i?in?range(numTrainDocs):#遍歷每個文檔
????????#if?else潛在遍歷類別,共2個類別
????????if?trainCategory[i]==1:?#一旦某個詞出現在某個文檔中出現(出現為1,不出現為0)
????????????p1Num+=trainMatrix[i]??#該詞數加1
????????????p1Denom+=sum(trainMatrix[i])?#文檔總詞數加1
????????else:?#另一個類別
????????????p0Num+=trainMatrix[i]
????????????p0Denom+=sum(trainMatrix[i])
????????#?p1Vect?=?p1Num?/?p1Denom
????????#?p0Vect?=?p0Num?/?p0Denom
????p1Vec?=?log(p1Num?/?p1Denom)
????p0Vec?=?log(p0Num?/?p0Denom)#避免下溢出或者浮點數舍入導致的錯誤-下溢出是由非常多很小的數相乘得到的
????return?p0Vec?p1Vec?pAbusive??#返回p0Vec,p1Vec都是矩陣,對應每個詞在文檔總體中出現概率,pAb對應文檔屬于1的概率

#樸素貝葉斯分類器-給定詞向量?判斷類別
def?classifyNB(vec2Classifyp0Vecp1VecpClass1):?
#第一個參數為01組合二分類矩陣,對應詞匯表各個詞是否出現
#p0Vecp1VecpClass1:分別對應trainNB0計算得到的3個概率
????p1=sum(vec2Classify*p1Vec)+log(pClass1)
????p0=sum(vec2Classify*p0Vec)+log(1.0-pClass1)
????if?p1>p0:
????????return?1
????else:?return?

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2017-08-24?08:34??bayes\
?????文件???????10531??2017-08-24?10:25??bayes\bayes.py
?????目錄???????????0??2017-08-24?08:34??bayes\email\
?????目錄???????????0??2017-08-24?08:34??bayes\email\ham\
?????文件?????????141??2017-05-15?17:44??bayes\email\ham\1.txt
?????文件??????????82??2017-05-15?17:44??bayes\email\ham\10.txt
?????文件?????????122??2017-05-15?17:44??bayes\email\ham\11.txt
?????文件?????????172??2017-05-15?17:44??bayes\email\ham\12.txt
?????文件?????????164??2017-05-15?17:44??bayes\email\ham\13.txt
?????文件?????????162??2017-05-15?17:44??bayes\email\ham\14.txt
?????文件?????????522??2017-05-15?17:44??bayes\email\ham\15.txt
?????文件??????????90??2017-05-15?17:44??bayes\email\ham\16.txt
?????文件?????????454??2017-05-15?17:44??bayes\email\ham\17.txt
?????文件?????????168??2017-05-15?17:44??bayes\email\ham\18.txt
?????文件?????????151??2017-05-15?17:44??bayes\email\ham\19.txt
?????文件?????????232??2017-05-15?17:44??bayes\email\ham\2.txt
?????文件?????????204??2017-05-15?17:44??bayes\email\ham\20.txt
?????文件?????????229??2017-05-15?17:44??bayes\email\ham\21.txt
?????文件?????????324??2017-05-15?17:44??bayes\email\ham\22.txt
?????文件?????????601??2017-05-15?17:44??bayes\email\ham\23.txt
?????文件??????????42??2017-05-15?17:44??bayes\email\ham\24.txt
?????文件??????????88??2017-05-15?17:44??bayes\email\ham\25.txt
?????文件?????????364??2017-05-15?17:44??bayes\email\ham\3.txt
?????文件?????????205??2017-05-15?17:44??bayes\email\ham\4.txt
?????文件?????????113??2017-05-15?17:44??bayes\email\ham\5.txt
?????文件????????1458??2017-05-15?17:44??bayes\email\ham\6.txt
?????文件?????????103??2017-05-15?17:44??bayes\email\ham\7.txt
?????文件?????????634??2017-05-15?17:44??bayes\email\ham\8.txt
?????文件?????????142??2017-05-15?17:44??bayes\email\ham\9.txt
?????目錄???????????0??2017-08-24?08:34??bayes\email\spam\
?????文件?????????235??2017-05-15?17:44??bayes\email\spam\1.txt
............此處省略26個文件信息

評論

共有 條評論