-
大小: 210KB文件類型: .rar金幣: 2下載: 0 次發(fā)布日期: 2023-08-04
- 語言: Python
- 標(biāo)簽: 機(jī)器學(xué)習(xí)??字母分類??python??Data??
資源簡介
打開網(wǎng)站鏈接http://archive.ics.uci.edu/ml/,點(diǎn)擊鏈接 view all data sets,打開所有數(shù)據(jù)頁面,點(diǎn)擊Instances,按照研究實(shí)例由多到少排序,選擇任務(wù)為Classification的數(shù)據(jù)集,最后我們小組選擇了“Letter Recognition Data Set”字母識(shí)別數(shù)據(jù)集。
二、數(shù)據(jù)分析
字母識(shí)別數(shù)據(jù)集每個(gè)對(duì)象有16個(gè)特征,共包括20000個(gè)數(shù)據(jù)對(duì)象,每個(gè)特征的取值都為整數(shù),于1991年1月1日提供,主要用來進(jìn)行數(shù)據(jù)分類試驗(yàn)。分類的目標(biāo)是識(shí)別由黑白像素組成矩形的圖像,代表26英文字母哪個(gè)字母。這些圖像基于20種不同字體,并經(jīng)過隨機(jī)變形生成的20000個(gè)模擬實(shí)例。每個(gè)實(shí)例轉(zhuǎn)化成16個(gè)原始數(shù)字特征,其中10000用于訓(xùn)練,另外10000個(gè)用于字母預(yù)測。因?yàn)槊總€(gè)樣本都有明確的類標(biāo)識(shí),所以這個(gè)一個(gè)監(jiān)督學(xué)習(xí)過程。

代碼片段和文件信息
from?numpy?import?*
import?string
#parse?files?function?every?data?is?a?integer
def?loadDataSet(filename):
????numFeat?=?len(open(filename).readline().split(‘‘))
????dataMat?=?[]
????labelMat=[]
????fr?=?open(filename)
????for?line?in?fr.readlines():
????????lineArr=?[]
????????curLine?=?line.strip(‘\n‘).split(‘‘)
????????for?i?in?range(1?numFeat):
????????????lineArr.append(int(curLine[i]))
????????dataMat.append(lineArr)
????????labelMat.append(curLine[0])
????fr.close()
????return?dataMat?labelMat
‘‘‘‘‘
purpose:?data?classify?by?compare?to?threshold
‘‘‘
def?stumpClassify(dataMatrix?dimen?threshVal?threshIneq):
????retArray?=?ones((shape(dataMatrix)[0]1))
????if?threshIneq?==?‘lt‘:
????????retArray[dataMatrix[:dimen] ????else:
????????retArray[dataMatrix[:dimen] ????return?retArray
‘‘‘‘‘
purpose:?single?level?decision?tree?create?function(weak?classify?device)
input:??dataArr:?dataSet?classLabels:class?label?D:data?weight
output:??bestStump:?single?level?decision?tree?having?min?error?rate?minError:?min?Error?rate
?????????bestClassEst:?estimate?class?labels
‘‘‘
def?buildStump(dataArrclassLabelsD):
????dataMatrix?=?mat(dataArr);?labelMat?=?mat(classLabels).T
????mn?=?shape(dataMatrix)
????numSteps?=?10.0
????#?define?a?empty?dictionary?for?store?Dthe?better?single?level?tree?info
????bestStump?=?{}
????bestClasEst?=?mat(zeros((m1)))
????minError?=?inf?#init?error?sum?to?+infinity
????for?i?in?range(n):#loop?over?all?dimensions
????????rangeMin?=?dataMatrix[:i].min()
????????rangeMax?=?dataMatrix[:i].max()
????????stepSize?=?(rangeMax-rangeMin)/numSteps
????????for?j?in?range(-1int(numSteps)+1):#loop?over?all?range?in?current?dimension
????????????for?inequal?in?[‘lt‘?‘gt‘]:?#go?over?less?than?and?greater?than
????????????????threshVal?=?(rangeMin?+?float(j)?*?stepSize)
????????????????predictedVals?=?stumpClassify(dataMatrixithreshValinequal)#call?stump?classify?with?i?j?lessThan
????????????????errArr?=?mat(ones((m1)))?#?create?error?array
????????????????errArr[predictedVals?==?labelMat]?=?0
????????????????weightedError?=?D.T*errArr??#calc?total?error?multiplied?by?D
????????????????#print?“split:?dim?%d?thresh?%.2f?thresh?ineqal:?%s?the?weighted?error?is?%.3f“?%?(i?threshVal?inequal?weightedError)
????????????????if?weightedError?????????????????????minError?=?weightedError
????????????????????bestClasEst?=?predictedVals.copy()
????????????????????bestStump[‘dim‘]?=?i
????????????????????bestStump[‘thresh‘]?=?threshVal
????????????????????bestStump[‘ineq‘]?=?inequal
????return?bestStumpminErrorbestClasEst
‘‘‘‘‘
purpose:whole?AdaBoost?algorithm
input?parameter:
dataArr:data?set
classLabels:class?labels
numIt:die?dai?number?(only?one?parameter?needed?user?to?specified)
output?parameter:
weakClassArr:seve
?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----
?????文件?????356180??2016-11-24?20:38??traindata.txt
?????文件???????7150??2016-11-26?22:02??TreeAdaBoost.py
?????文件??????36042??2017-03-18?09:31??文檔.docx
?????文件?????356383??2016-11-24?20:39??testdata.txt
-----------?---------??----------?-----??----
???????????????755755????????????????????4
評(píng)論
共有 條評(píng)論