西電數據挖掘大作業之商場數據分析

大小: 977KB

文件類型: .zip

金幣: 2

下載: 1 次

發布日期: 2021-09-14
語言: 其他
標簽:

高速下載

資源簡介

西電數據挖掘大作業之商場數據分析。

資源截圖

小圖大圖

代碼片段和文件信息

#?-*-?coding:?utf-8?-*-
“““
Created?on?Sat?Aug??25?13:45:40?2018

@author:?Pratik
“““

import?pandas?as?pd
import?numpy?as?np
import?seaborn?as?sns
sns.set（）
import?matplotlib.pyplot?as?plt
from?sklearn.neighbors?import?KNeighborsClassifier
knn?=?KNeighborsClassifier（n_neighbors=5）

train?=?pd.read_csv（‘Train.csv‘）
test?=?pd.read_csv（‘Test.csv‘）

#?We?will?combine?the?train?and?test?data?to?perform?feature?engineering我們將結合訓練和測試數據進行特征工程

train[‘source‘]?=?‘train‘
test[‘source‘]?=?‘test‘

data?=?pd.concat（[train?test]?ignore_index=True）
print（‘--------------------------------------------------------------‘）
print（train.shape?test.shape?data.shape）
print（‘--------------------------------------------------------------\n‘）
#?As?the?problem?is?already?defined?--?we?know?that?we?need?to?predict?sales?by?the?store??問題已經定義好了——我們知道我們需要預測商店的銷售額

data.info（）
data.describe（）

#?Some?observations
#?1.?item_visibility?has?min?value?of?0?which?is?less?likely??項目可見性的最小值為0，這是不太可能的
#?2.?Outlet_Establishment_Year?will?be?more?useful?in?a?way?by?which?we?could?know?how?old?it?is?在某種程度上，網點建立年將更有用，這樣我們就可以知道它的年齡

#?Lets?check?how?many?unique?items?each?column?has?讓我們檢查每個列有多少個惟一項
data.apply（lambda?x:?len（x.unique（）））

#?Let?us?have?a?look?at?the?object?datatype?columns??讓我們看一下對象數據類型列

for?i?in?train.columns:
????if?train[i].dtype?==?‘object‘:
????????print（train[i].value_counts（））
????????print（‘--------------------------------------------\n‘）
????????print（‘--------------------------------------------‘）

#?The?output?gives?us?following?observations:輸出結果給出了以下觀察結果

#?Item_Fat_Content:?Some?of?‘Low?Fat’?values?mis-coded?as?‘low?fat’?and?‘LF’.?Also?some?of?‘Regular’?are?mentioned?as?‘regular’.項目脂肪含量:一些低脂值被錯誤編碼為低脂和低脂。此外，一些規則也被稱為規則。
#?Item_Type:?Not?all?categories?have?substantial?numbers.?It?looks?like?combining?them?can?give?better?results.項目類型:不是所有的類別都有大量的數字。看起來把它們結合在一起可以得到更好的結果。
#?Outlet_Type:?Supermarket?Type2?and?Type3?can?be?combined.?But?we?should?check?if?that’s?a?good?idea?before?doing?it.出口類型:超市2型和3型可組合。但是我們應該在做這件事之前檢查一下這是不是一個好主意。

#?missing?value?percentage缺失值的百分比
#?Item_Weight?and?Outlet_Size?has?some?missing?values
print（‘--------------------------------------------‘）
print（‘missing?value?percentage:‘）
print（（data[data[‘Item_Weight‘].isnull（）].shape[0]?/?data.shape[0]）?*?100）
print（（data[data[‘Outlet_Size‘].isnull（）].shape[0]?/?data.shape[0]）?*?100）
print（‘--------------------------------------------\n‘）

#?we?impute?missing?values
data[‘Item_Weight‘]?=?data[‘Item_Weight‘].fillna（data[‘Item_Weight‘].mean（））
#?data[‘Outlet_Size‘]?=?data[‘Outlet_Size‘].fillna（data[‘Outlet_Size‘].mode（））
data[‘Outlet_Size‘].fillna（data[‘Outlet_Size‘].mode（）[0]?inplace=True）


#?lets?change?item_visibility?from?0?to?mean?to?make?sense讓我們將項目可見性從0更改為有意的
data[‘Item_Visibility‘]?=?data[‘Item_Visibility‘].replace（
????0?data[‘Item_Visibility‘].mean（））

#?we?will?calculate?meanRatio?for?each?object‘s?visibility我

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2018-10-17?20:59??bigmart-master\
?????文件????????1203??2018-09-07?02:49??bigmart-master\.gitignore
?????文件??????181844??2018-10-18?22:29??bigmart-master\alg0.csv
?????文件??????112910??2018-10-17?21:32??bigmart-master\alg1.csv
?????文件??????179127??2018-10-17?21:32??bigmart-master\alg2.csv
?????文件??????177867??2018-10-17?21:32??bigmart-master\alg3.csv
?????文件??????178794??2018-10-17?21:32??bigmart-master\alg6.csv
?????文件????????9337??2018-10-18?22:29??bigmart-master\BigMart.py
?????文件??????????37??2018-09-07?02:49??bigmart-master\README.md
?????文件??????527709??2018-09-07?02:49??bigmart-master\Test.csv
?????文件??????965049??2018-10-18?22:29??bigmart-master\test_modified.csv
?????文件??????869537??2018-09-07?02:49??bigmart-master\Train.csv
?????文件?????1534109??2018-10-18?22:29??bigmart-master\train_modified.csv

上一篇：8通道數據采集的Labview源代碼，PC端代碼.zip
下一篇：基于51單片機的溫度報警器程序和原理圖

91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

西電數據挖掘大作業之商場數據分析

資源簡介

資源截圖

代碼片段和文件信息

評論

相關資源