資源簡介
決策樹生成算法的Java實現,可能還有一些BUG,沒有做仔細校驗與測試,完成主要功能。決策樹具體詳解移步:http://blog.csdn.net/adiaixin123456/article/details/50573849
項目的目錄結構分為四個文件夾algorithm,common,data,test
(1)algorithm為算法,包括DecisionTree(決策樹生成算法)、IAttrSelector(最佳分裂點屬性選擇算法接口)、BaseAttrSelector(基礎的屬性選擇算法實現)
(2)common為公用類,只包含了表示多叉樹的類TreeNode
(3)data為數據,包含了BaseRecord(基礎記錄,這里只有一個屬性,就是要分類的屬性Boolean的,其他數據庫實體都應該繼承該類)
HummanAttrRecord(描述用戶的屬性類,包括收入、年齡、是否為學生、信用評級)、
EmAgeLevel(年齡枚舉類)、EmCreditRate(信用枚舉類)、EmIncome(收入枚舉類)。
(4)test為測試類

代碼片段和文件信息
package?com.adi.datamining.algorithm;
import?com.adi.datamining.data.baseRecord;
import?java.lang.reflect.Field;
import?java.util.*;
/**
?*?Created?by?wudi10?on?2016/1/23.
?*/
public?class?baseAttrSelector?implements?IAttrSelector{
????/**通過記錄集合與記錄的屬性集合,挑選出屬性中增益度最大的屬性*/
????@Override
????public?Field?select(ListseRecord>?records?Set?atrrs){
????????Field?bestField?=?null;
????????Double?highestScore?=?0D;
????????Double?setInfo?=?entropy(records);
????????for(Field?field?:?atrrs)?{
????????????Double?gainScore?=?setInfo?-?infoScore(?records?field);
????????????????if(gainScore?>?highestScore)?{
????????????????highestScore?=?gainScore;
????????????????bestField?=?field;
????????????}
????????}
????????return?bestField;
????}
????/**根據記錄列表求關于所求類的熵,此方法中要分的類是DcisionAtrr*/
????private?Double?entropy(ListseRecord>?records)?{
????????Double?positCount?=?0D;
????????Double?negatCount?=?0D;
????????for(baseRecord?record?:?records)?{
????????????if(record.getDecisionAttr())
????????????????++positCount;
????????????else
????????????????++negatCount;
????????}
????????return?-?positCount/records.size()*?log2N(positCount?/?records.size())
????????????????-?negatCount/records.size()*?log2N(negatCount?/?records.size());
????}
????/**log2(N)?log?以2為底N的對數*/
????private?Double?log2N(Double?d)?{
????????return?Math.log(d)?/?Math.log(2.0);
????}
????/**求某個屬性對于分類DecisionAttr的期望分數,公式見<數據挖掘概念與技術>中決策樹那節*/
????private?Double?infoScore(ListseRecord>?records?Field?field)?{
????????Double?infoScore?=?0D;
????????try?{
????????????//1.求該屬性每個值對于分類的正負樣例個數,即有多少是true,多少個false;
????????????MapjectList>?count4Values?=?new?HashMapjectList>();//key:存放該屬性不同值value:長度為2,存放該屬性值對分類正負樣例數
????????????Integer?size?=?records.size();
????????????field.setAccessible(true);
????????????for(baseRecord?record?:?records)?{
????????????????object?attrValue?=?field.get(record);
????????????????List?countList?=?count4Values.get(attrValue);
????????????????if(countList?==?null)?{
????????????????????countList?=?new?ArrayList(2);
????????????????????countList.add(00);
????????????????????countList.add(10);
????????????????}
????????????????if(record.getDecisionAttr()){
????????????????????countList.set(0countList.get(0)?+?1);
????????????????}?else?{
????????????????????countList.set(1countList.get(1)?+?1);
????????????????}
????????????????count4Values.put(attrValue?countList);
????????????}
????????????//2.遍歷map算出期望值
????????????for(object?key?:?count4Values.keySet())?{
????????????????List?countList?=?count4Values.get(key);
????????????????double?positCount?=?countList.get(0);
????????????????double?negatCount?=?countList.get(1);
????????????????if(positCount?==?0?||?negatCount?==?0)?//對于正負樣例個數為0的情況,視為無效,對分類影響最大,分數為0;
????????????????????continue;
????????????????double?valueCount?=?positCount?+?negatCount;
????????????????infoScore?+=?
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2016-01-24?18:05??DecisionTree\
?????目錄???????????0??2016-01-24?18:05??DecisionTree\src\
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\adi\
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\adi\datamining\
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\adi\datamining\algorithm\
?????文件????????3509??2016-01-24?00:06??DecisionTree\src\com\adi\datamining\algorithm\ba
?????文件????????3476??2016-01-24?15:29??DecisionTree\src\com\adi\datamining\algorithm\DecisionTree.java
?????文件?????????320??2016-01-24?17:47??DecisionTree\src\com\adi\datamining\algorithm\IAttrSelector.java
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\adi\datamining\common\
?????文件????????1148??2016-01-24?17:48??DecisionTree\src\com\adi\datamining\common\TreeNode.java
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\com\adi\datamining\data\
?????文件?????????442??2016-01-23?17:16??DecisionTree\src\com\adi\datamining\data\ba
?????文件?????????405??2016-01-23?15:46??DecisionTree\src\com\adi\datamining\data\EmAgeLevel.java
?????文件?????????380??2016-01-23?15:46??DecisionTree\src\com\adi\datamining\data\EmCreditRate.java
?????文件?????????395??2016-01-23?15:46??DecisionTree\src\com\adi\datamining\data\EmIncome.java
?????文件????????1218??2016-01-24?16:27??DecisionTree\src\com\adi\datamining\data\HumanAttrRecord.java
?????目錄???????????0??2016-01-24?18:04??DecisionTree\src\test\
?????文件????????4240??2016-01-24?17:36??DecisionTree\src\test\Test.java
評論
共有 條評論