91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

資源簡介

算法思想:提取文檔的TF/IDF權重,然后用余弦定理計算兩個多維向量的距離來計算兩篇文檔的相似度,用標準的k-means算法就可以實現文本聚類。源碼為java實現

資源截圖

代碼片段和文件信息

package?textcluster;

import?java.util.List;



?///?
????///?分詞器接口
????///?

????public?interface?ITokeniser
????{
????????List?partition(String?input);
????}

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件???????1510??2009-05-08?07:30??textcluster\WawaCluster.java

?????文件???????5669??2009-05-08?07:57??textcluster\WawaKMeans.java

?????文件????????204??2009-05-07?11:02??textcluster\ITokeniser.java

?????文件???????1487??2009-05-07?21:58??textcluster\Tokeniser.java

?????文件???????3474??2009-05-08?07:55??textcluster\Program.java

?????文件???????1152??2009-05-07?22:02??textcluster\StopWordsHandler.java

?????文件???????1392??2009-05-07?11:04??textcluster\TermVector.java

?????文件???????6930??2009-05-08?10:27??textcluster\TFIDFMeasure.java

?????文件????????606??2009-05-07?10:45??textcluster\input.txt

?????目錄??????????0??2009-05-08?16:55??textcluster

-----------?---------??----------?-----??----

????????????????22424????????????????????10


評論

共有 條評論