資源簡介
競賽時寫的一個python小程序,得了0.93分,思路如下,1讀取訓練集、2數據預處理、3上模型、4將模型應用到預測及、5生成預測結果
代碼片段和文件信息
import?pandas?as?pd
import?numpy?as?np
from?sklearn.cross_validation?import?train_test_split
from?sklearn?import?metrics
import?types?
from?sklearn.ensemble?import?RandomForestClassifier
from?sklearn.metrics?import?accuracy_scoremake_scorer
purchase1?=?pd.read_csv(u‘aa.csv‘sep=‘‘)
purchase1.shape
purchase1?=?purchase1.replace(‘\\N‘?np.nan)
purchase1?=?purchase1.fillna(method=?‘pad‘)
purchase=purchase1.drop([‘cust_id‘]?axis?=?1)
col_type=purchase.dtypes
a1=col_type.values==object
leibie=col_type[a1].index
purchase=purchase.drop(leibieaxis?=?1)
X?=?purchase.drop(‘false_flag‘?axis?=?1)
y?=?purchase[‘false_flag‘]
X_train?X_test?y_train?y_test?=?train_test_split(X?y?train_size?=?0.5random_state=0)
##X_train.corr()
##model?=?RandomForestClassifier(n_estimators=10max_features=0.6min_samples_leaf=20)
model?=?RandomFore
評論
共有 條評論