-
大小: 12.38MB文件類型: .rar金幣: 2下載: 0 次發(fā)布日期: 2024-02-03
- 語言: Python
- 標(biāo)簽: 機(jī)器學(xué)習(xí)??分類??python??
資源簡介
壓縮包主要采用隨機(jī)森林算法處理adult數(shù)據(jù)集的分類問題,主要包含四部分,第一部分是由python編寫的adult數(shù)據(jù)集預(yù)處理過程,第二部分是自己編寫的隨機(jī)森林算法處理adult數(shù)據(jù)集,第三部分是調(diào)用python中sklearn模塊處理adult分類問題,第四部分是基于matlab調(diào)用5種機(jī)器學(xué)習(xí)分類算法分別處理adult分類問題比較哪種算法能夠取得更好的分類效果。
代碼片段和文件信息
#?-*-?coding:?utf-8?-*-
“““
Created?on?Tue?Nov??6?13:29:41?2018
@author:?28770
“““
import?pandas?as?pd
excelFile=r‘ML_data2.xlsx‘
train_df?=?pd.Dataframe(pd.read_excel(excelFilesheet_name=0))??#讀取指定路徑的表格的sheet0為文件并轉(zhuǎn)換到結(jié)構(gòu)框格式
test_df=?pd.Dataframe(pd.read_excel(excelFilesheet_name=1))?#讀取指定路徑的表格的sheet1為文件并轉(zhuǎn)換到結(jié)構(gòu)框格式
‘‘‘
#workClass_loss用于返回train_df中‘workClass‘這一列中的確實(shí)項(xiàng),缺失數(shù)據(jù)處為True
workClass_loss=train_df[‘workClass‘].isnull()??#.notnull()效果與其相反。
‘‘‘
‘‘‘
缺失值填充步驟:(使用缺失值上一行的數(shù)據(jù)填充缺失值處)
對train_df中的缺失值進(jìn)行填充,其中.mode()是用這一列的眾數(shù)填充,mean()使用列平均值填充。
其中,由于可能某一列有多個(gè)相同的眾數(shù),因此.mode()返回的是一個(gè)series不像mean()一樣返回
的是一個(gè)數(shù)值,因此,采用.mode()[0]自動將其填充為第一個(gè)眾數(shù)。
‘‘‘
train_df_fill=train_df.fillna(method=“ffill“)
test_df_fill=test_df.fillna(method=“ffill“)
‘‘‘
刪除重復(fù)的列信息
‘‘‘
train_df_fill=train_df_fill.drop([‘education‘]1)
test_df_fill=test_df_fill.drop([‘education‘]1)
‘‘‘
離散特征映射
‘‘‘
salary_mapping={‘<=50K‘:0‘>50K‘:1}
train_df_fill[‘salary‘]=train_df_fill[‘salary‘].map(salary_mapping)
test_df_fill[‘salary‘]=test_df_fill[‘salary‘].map(salary_mapping)
Discrete_attribute=[‘workClass‘‘education‘‘marital_status‘‘occupation‘
????????????????????‘relationship‘‘race‘‘sex‘‘native_country‘]
for?attribute?in?Discrete_attribute:
????attribute_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[attribute]))}?
????train_df_fill[attribute]?=?train_df_fill[attribute].map(attribute_mapping)??
????test_df_fill[attribute]?=?test_df_fill[attribute].map(attribute_mapping)
‘‘‘
workClass_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘workClass‘]))}?
train_df_fill[‘workClass‘]?=?train_df_fill[‘workClass‘].map(workClass_mapping)??
test_df_fill[‘workClass‘]?=?test_df_fill[‘workClass‘].map(workClass_mapping)?
education_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘education‘]))}?
train_df_fill[‘education‘]?=?train_df_fill[‘education‘].map(education_mapping)??
test_df_fill[‘education‘]?=?test_df_fill[‘education‘].map(education_mapping)?
marital_status_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘marital_status‘]))}?
train_df_fill[‘marital_status‘]?=?train_df_fill[‘marital_status‘].map(marital_status_mapping)??
test_df_fill[‘marital_status‘]?=?test_df_fill[‘marital_status‘].map(marital_status_mapping)?
occupation_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘occupation‘]))}?
train_df_fill[‘occupation‘]?=?train_df_fill[‘occupation‘].map(occupation_mapping)??
test_df_fill[‘occupation‘]?=?test_df_fill[‘occupation‘].map(occupation_mapping)?
relationship_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘relationship‘]))}?
train_df_fill[‘relationship‘]?=?train_df_fill[‘relationship‘].map(relationship_mapping)??
test_df_fill[‘relationship‘]?=?test_df_fill[‘relationship‘].map(relationship_mapping)?
race_mapping?=?{lab:idx?for?idxlab?in?enumerate(set(train_df_fill[‘race‘]))}?
train_df_fill[‘race‘]?=?train_df_fill[‘race‘].map(race_mapping)??
test_df_
?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----
?????文件???????4575??2018-11-13?23:33??Random_Forest\excel_change.py
?????文件???????1589??2018-11-13?20:55??Random_Forest\Matlab_xlr\excel_run.m
?????文件????2677491??2018-11-06?20:50??Random_Forest\Matlab_xlr\ML_data2_trans.xlsx
?????文件????2918697??2018-11-01?21:57??Random_Forest\ML_data2.xlsx
?????文件?????642592??2018-11-08?10:55??Random_Forest\ML_data2_test.csv
?????文件????1285749??2018-11-08?10:55??Random_Forest\ML_data2_train.csv
?????文件????2677491??2018-11-06?20:50??Random_Forest\ML_data2_trans.xlsx
?????文件?????642435??2018-11-08?10:59??Random_Forest\Random?Forest\ML_data2_test.csv
?????文件????1285592??2018-11-08?10:59??Random_Forest\Random?Forest\ML_data2_train.csv
?????文件????2677491??2018-11-06?20:50??Random_Forest\Random?Forest\ML_data2_trans.xlsx
?????文件??????10260??2018-11-14?13:26??Random_Forest\Random?Forest\Random_Forest.py
?????文件?????642435??2018-11-08?10:59??Random_Forest\RF_sklearn\ML_data2_test.csv
?????文件????1285592??2018-11-08?10:59??Random_Forest\RF_sklearn\ML_data2_train.csv
?????文件????2677491??2018-11-06?20:50??Random_Forest\RF_sklearn\ML_data2_trans.xlsx
?????文件???????1259??2018-11-14?14:15??Random_Forest\RF_sklearn\RF_sklearn.py
?????文件????????214??2018-11-14?13:51??Random_Forest\文本描述(首先閱讀).txt
?????目錄??????????0??2018-12-14?10:51??Random_Forest\Matlab_xlr
?????目錄??????????0??2018-12-14?10:51??Random_Forest\Random?Forest
?????目錄??????????0??2018-12-14?10:51??Random_Forest\RF_sklearn
?????目錄??????????0??2018-12-14?10:51??Random_Forest
-----------?---------??----------?-----??----
?????????????19430953????????????????????20
評論
共有 條評論