20newsgroup python分類聚類

大小: 4KB

文件類型: .py

金幣: 1

下載: 0 次

發布日期: 2021-05-04
語言: Python
標簽: 20newsgroup??python??

高速下載

資源簡介

http://blog.csdn.net/abcjennifer/article/details/23615947

資源截圖

小圖大圖

代碼片段和文件信息

#first?extract?the?20?news_group?dataset?to?/scikit_learn_data
from?sklearn.datasets?import?fetch_20newsgroups
#all?categories
#newsgroup_train?=?fetch_20newsgroups（subset=‘train‘）
#part?categories
categories?=?[‘comp.graphics‘
?‘comp.os.ms-windows.misc‘
?‘comp.sys.ibm.pc.hardware‘
?‘comp.sys.mac.hardware‘
?‘comp.windows.x‘];
newsgroup_train?=?fetch_20newsgroups（subset?=?‘train‘categories?=?categories）;

def?calculate_result（actualpred）:
????m_precision?=?metrics.precision_score（actualpred）;
????m_recall?=?metrics.recall_score（actualpred）;
????print?‘predict?info:‘
????print?‘precision:{0:.3f}‘.format（m_precision）
????print?‘recall:{0:0.3f}‘.format（m_recall）;
????print?‘f1-score:{0:.3f}‘.format（metrics.f1_score（actualpred））;
????

#print?category?names
from?pprint?import?pprint
pprint（list（newsgroup_train.target_names））



#newsgroup_train.data?is?the?original?documents?but?we?need?to?extract?the?
#TF-IDF?vectors?inorder?to?model?the?text?data
from?sklearn.feature_extraction.text?import?TfidfVectorizer?HashingVectorizer
#vectorizer?=?TfidfVectorizer（sublinear_tf?=?True
#???????????????????????????max_df?=?0.5
#???????????????????????????stop_words?=?‘english‘）;
#however?Tf-Idf?feather?extractor?makes?the?training?set?and?testing?set?have
#divergent?number?of?features.?（Because?they?have?different?vocabulary?in?documents）
#So?we?use?HashingVectorizer
vectorizer?=?HashingVectorizer（stop_words?=?‘english‘non_negative?=?True
???????????????????????????????n_features?=?100）
fea_train?=?vectorizer.fit_transform（newsgroup_train.data）
#return?feature?vector?‘fea_train‘?[n_samplesn_features]
print?‘Size?of?fea_train:‘?+?repr（fea_train.shape）
#11314?documents?130107?vectors?for?all?categories
print?‘The?average?feature?sparsity?is?{0:.3f}%‘.format（
fea_train.nnz/float（fea_train.shape[0]*fea_train.shape[1]）*100）;


#####

上一篇：決策樹DecisionTree項目python代碼實現
下一篇：linphone-desktop出現Cmake Error at builders/vpx.cmake:89的幾個錯誤文件解決

91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

20newsgroup python分類聚類

資源簡介

資源截圖

代碼片段和文件信息

評論

相關資源