資源簡介
此代碼主要是對數(shù)據(jù)集生成圖,第一部分是生成數(shù)據(jù)的相關(guān)性矩陣圖,第二部分是生成數(shù)據(jù)集的缺失圖,第三部分是數(shù)據(jù)通過PCA從多維降為二維后使用聚類處理在二維層面上顯示的散點圖,第三部分是分類算法對數(shù)據(jù)集的處理輸出為分類準(zhǔn)確率,分類算法包括隨機(jī)森林,樸素貝葉斯,決策樹,KNN,支持向量機(jī),和神經(jīng)網(wǎng)絡(luò)。以上皆為代碼所能處理的功能。如果你是需要對數(shù)據(jù)集進(jìn)行分析需要圖,這份代碼就比較合適。
代碼片段和文件信息
#注意,如果代碼有缺失值,請先將缺失值使用NA填充,代碼只識別NA為缺失值。
#輸入數(shù)據(jù)集保持csv格式
import?pandas?as?pd
import?numpy?as?np
import?os
import?os.path
import?matplotlib?as?mpl
import?matplotlib.pyplot?as?plt
from?sklearn.model_selection?import?cross_val_score
from?sklearn.model_selection?import?StratifiedKFold
import?seaborn?as?sns
import?missingno?as?msno
from?sklearn.decomposition?import?PCA
from?sklearn?import?preprocessing
from?sklearn.cluster?import?KMeans
from?sklearn.ensemble?import?RandomForestClassifier
from?sklearn.naive_bayes?import?MultinomialNB
from?sklearn.tree?import?DecisionTreeClassifier
from?sklearn.neighbors?import?KNeighborsClassifier
from?sklearn.neural_network?import?MLPClassifier
from?sklearn.svm?import?SVC
rf?=?RandomForestClassifier(bootstrap=True?class_weight=None?criterion=‘gini‘
????????????max_depth=30?max_features=‘a(chǎn)uto‘?max_leaf_nodes=None
????????????min_impurity_decrease=0.0?min_impurity_split=None
????????????min_samples_leaf=1?min_samples_split=6
????????????min_weight_fraction_leaf=0.0?n_estimators=400?n_jobs=None
????????????oob_score=False?random_state=42?verbose=0
????????????warm_start=False)
NB?=?MultinomialNB(alpha=1.0?class_prior=None?fit_prior=True)
tree?=?DecisionTreeClassifier(max_depth=30)
Knn?=?KNeighborsClassifier()
svc?=?SVC(gamma=‘a(chǎn)uto‘kernel=‘linear‘)
mlp?=?MLPClassifier(solver=‘lbfgs‘?alpha=1e-5hidden_layer_sizes=(5?5)?random_state=1)
def?plot_make(datasetname):
????ds_corr?=?dataset.corr(method=‘pearson‘?min_periods=1)
????f?ax?=?plt.subplots(figsize=(14?10))
????sns.heatmap(ds_corr?cmap=‘RdBu‘?linewidths=0.05?ax=ax)
????ax.set_title(‘Correlation?between?features?in?‘?+?name)
????f.savefig(name?+?‘.png‘?dpi=100?bbox_inches=‘tight‘)
def?Noise_found(datasetname):
????dataset?=?np.array(dataset)
????X?=?np.delete(dataset?-1?axis=1)
????#?#y?=?dataset[:?-1]
????for?i?in?range(X.shape[0]):#行數(shù)
????????for?j?in?range(X.shape[1]):#列數(shù)
????????????if?X[i][j]?==?‘NA‘:
????????????????X[i][j]?==?‘NaN‘
????imp?=?preprocessing.Imputer(missing_values=‘NaN‘strategy=‘most_frequent‘)#先來個簡單填補
????imp.fit(X)
????X?=?imp.transform(X)
????pca?=?PCA(n_components=2)
????reduced_X?=?pca.fit_transform(X)
????k1?=?KMeans(n_clusters=2)??#?將其類別分為3類
????k1.fit(reduced_X)
????kc1?=?k1.clus
評論
共有 條評論