DTreesAndOverfitting-1-決策樹和過度擬合_第1頁
DTreesAndOverfitting-1-決策樹和過度擬合_第2頁
DTreesAndOverfitting-1-決策樹和過度擬合_第3頁
DTreesAndOverfitting-1-決策樹和過度擬合_第4頁
DTreesAndOverfitting-1-決策樹和過度擬合_第5頁
已閱讀5頁,還剩19頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、Machine Learning 10-701Tom M. MitchellMachine Learning DepartmentCarnegie Mellon UniversityJanuary 11,2011Readi ngs: MThe Discipline of MLn Mitchell, Chapter 3 Bishop, Chapter 14.4Today: What is machine learning? Decision tree learning Course logisticsMachine Learning:Study of algorithms that improv

2、e their perfomanee P at some task T with experienee Ewell-defined learning task: <P,T,E>Learning to Predict Emergency C-SectionsSims et al .2000Data:3 23noDrtM*" noUluBKajcd ?pre C-S«C1O«-乍Emeeq»iB=y CSamian 7心33"Pynry "0A«*e«n« no Deb mm: VFS*th. 0

3、 U««DundtmiP«lh C-t«ten Em»«號ncC-SdQe 7Aa«23口rPgnnpy "noDiUmM. 29714 patient records, each witli 215 featuresPievtoPmaWteCsIk » UtiMDvnd TneckvC-fclbn 0 EmeigencyC Section; Yes2#One of 18 learned rules:If No previous vaginal delivery, and Abnormal 2nd Tri

4、mester Ultrasound, and Nalpresentation at admissionThen Probability of Emergency C-Section is 0.6Over training data: 26/41 = .63,Over test data: 12/20 .60Learning to detect objects in images#ULLi2$ Wr#4*» tte Mcl< vti KM» m OaAs TOTAL. 4uar 護»« 4t«Mt-mJ " m> i

5、1;«erw3 g abtfc: W« 4pro Mm . ufa<Slmg «M>«t WiMVBM ) <Or ze KMAir<4f4*05Ow 滬Mea Ad4l*»*<«r mg 20SS BSDSB 8QSQSHD Iwmnijnnm a§aaa.s§ 8S8QSBB nmcmrggEE目 尋 mmriETninnn Nfa曰一laQEGKlanHrsB §§n§SSDSQBQU E 曰 RR.mmm 口曽 nn m 曲 Tim 昌nQErra

6、ITu 穆目昌ctnacna 0l;.tiLJn3clc£K3tllslnt2 口 目 CEReadi ng a noun f (vs verb)Rustandi et al.t2005囚E3曰 g-wg-aDoolaaB8ESSS0SUSW 兇爵U3旦冒 GOCDEE 昌目旦旦過 I3GaM3RsLJ_巳曰曰nul_ 昌LJ£ 一 la 旦mb 一SQ8QHSO- auammmi 曰昌 CECCnDncDBHu 猝 syunu捋心2舊sBiaii舊Bs 曰醫(yī)mnoiGn昌國nnS3 Ed0000 ®超sei"Tl戶icfe M14BIS 23sSQQQ

7、QQlsffis-si_nroUJn-Llm s 曰EJLJOB DQOCQDeEBGQUUS S8QBQ s QQ QuaQS 同缶山汨遐舊余怨翹免山走一 QsssasBQaBBbu- BQBQSUUSBSBUSy BB 33E3LJE32JGtIE3B3G Hssssi - 昌曾男u三LJ-M工 ueaouourau_ 030 賞LJ日bl.亞盟囲蚩溜霜呂岀鑾圉配翕 日目固弓日月田日日弓日日廂日EEFSE因ill昌一33raD2mlgjgjlgllgl»_?VM an甲女3QBHIs-znRS9覺或"史 S3 哎 h:S0!=rLearning to classify

8、text documents> Company home pagevsPersonal home pagevsUniversity home pagevs3Machine Learning - PracticeSpeech RecogmtionMining DatabasesControl learmngObject recognition Supervised learning Bayesian networks Hidden Markov modelsText analysisr Boston mdUM II*I, l«4fpOtt 14M.riGsbMuc&Co

9、r Octcb<« 19B6, Pwta 19E6. M« ana)!cm3| LlP«M«tOi»»C CE和 rv> lejperjHceinjdwIeJcIrgiK , bd Elsic He hdt5 « B A»d w» M 0 A tommE” 3th««l, Mmw ha «e» Ottaalef Un supervised clustering Reinforcement learningMachine Learning TheoryCt

10、her theories forPAC Learning Theory(supervised concept learning) Reinforcement skill learning Semi-supervised learningActive student querying4# examples (m)representationalcomplexity (H) error rate(£)failureprobability (d)tn >|(ln|/7|4-ln(l/rf)also relating: # of mistakes during learning ear

11、ners query strategy converge nee rate asymptotic performaneeoias. variance#Machine Learning in Computer Science Machine learning already the preferred approach to 一 Speech recog nition, Natural language processi ng5Machine Learning in Computer Science Machine learning already the preferred approach

12、to一 Speech recog nition, Natural language processing一 Computer vision-Medical outcomes analysis 一 Robot control This ML niche is growing一 Improved machine learning algorithms一 In creased data capture, n etworking, new sen sors一 Software too complex to write by hand一 Demand for secustomization to use

13、r, environmentFunction Approximation and Decision tree learning6Function approximationProblem Setting: Set of possible in stances X Un know n target functio n f: X->K Set of function hypotheses H= h | h : X->K丿 superscript: ith training exampleInput: Training examplesof unknown target function

14、 fOutput: Hypothesis h w H that best approximates target function fA Decision tree forF: <Outlook, Humidity, Wind, Temp> -> Play Tennis?NoYesNoYesEach internal node: test one attribute 人Each branch from a no de: selects one value for X, Each leaf node: predict Y (or P(Y|X e leaf)Decision Tr

15、ee LearningProblem Setting: Set of possible instances X一 each in stance x in Xis a feature vector-e.g., vHumidit尸low, Wind=weak, Outlookrain, Temp=hot> Unknown target function f : XTY-Xis discrete valued Set of function hypotheses H= h h :一 each hypothesis h is a decision tree一 trees sorts x to l

16、eaf, which assignsDecision Tree LearningProblem Setting: Set of possible instances X一 each instance x in Xis a feature vector Unknown target function f :一 Kis discrete valued Set of function hypotheses H= h h : X->K一 each hypothesis h is a decision treeInput: Training examples <A0),y0)> of

17、unknown target function fOutput: Hypothesis hwHthat best approximates target function fDecision TreesSupposeX = <X.X> where AJ are boolean variablesArt*咦Learned from medical records of 1000 womenNegative examples are C-sectioiis833+,167-J .83+ .17-Fetal.Presentation = 1: 822+,116- .88+ Previou

18、s.Csection = 0: 767+,81-J .90+ .10-399+,13- .97+ .03-368+,68- .84+ .16-=0: 334+,47- .88+ .12-< 3349: 201+,10.6- .95+ >=3349: 133+,36.4- .78+=1: 34+,21- .62+ .38-I Previous.Csection = 1: 55+,35- .61+ .39-Fetal.Presentation = 2: 3+,29- .11+ .89-Fetal.Presentation = 3: 8+,22- .27+ .73-戻How would

19、you represe nt Y = X2X5?Y = x?7 X5How would you represent gA v X3XX)I I Primiparous = 0:I | Primiparous = 1: I I I Fetal.Distress Illi Birth-Weight Illi Birth.Weight I I I Fetal.DistressA Tree to Predict C-Section RiskTop-Down Induction of Decision TreesID3, C4.5, Quinlannode = RootMain loop:1. ,4 t

20、he ubest5- decision attribute for next node2. Assign .4 as decision attribute for node3. For each value of X, create new descendant ofnode4. Sort training examples to leaf nodes5 If training examples perfectly classified, ThenSTOP, Else iterate over new leaf nodesWhich attribute is best?H(X) is the

21、expected number of bits needed to encode a randomly drawn value of X (under most efficient code)Why? Information theory: Most efficient code assigns -log2P(lV=z) bits to encode the message X=i So, expected number of bits to code one random X is:n£ P(X = i)(-log2 P(X = t) i=lSample Entiopy S is

22、a sample of trainiug examples p is the proportion of positive examples in S p is the proportion of negative examples in S Entropy measures the impurity of SH(S)= 一氏og2Pr-pog2 旳En tropyEntropy H(X) of a random variable XnH(X)= 一工 P(X = i) log2 P(X = 0i=lSpecific conditional entropy H(XIY=v) of X give

23、n Y=v :nH(XY = q)= - 工 P(X = iY = v) log? P(X = iY = v) i=lConditional entropy H(XIY) of X given Y:HXY)= 刀 卩(丫 = d)h(x|y = v) vGvalu3(Y)Mututal informatio n (aka In formation Gain) of X and Y:/(X,y)= H(X) - H(X)Y) = H(Y) - H(YX)12Information Gain is the mutual information between in put attribute A

24、and target variable YInformation Gain is the expected reduction in entropy of target variable Y for data sample S, due to sorting on variable AGain(S. A)=心(4, Y) = HS(Y) 一 HS(YA)Training ExamplesDay Outlook Teiuperature HumidityWindPlayTeuiDISunnyHotHighWeakNoD2SunnvHotHighStrongNoD3OvercastHotHighW

25、eakYesD4RainMildHighWeakYesD5RainCoolNormalWeakYesD6RainCoolNormalStrongNoD7OvercastCoolNormalStrongYesD8SunnvMildHighWeakNoD9SunnyCoolNormalWeakYesDIORainMildNormalWeakYesDllSunnyMildNormalStrongYesD12OvercastMildHighStrougYesD13OvercastHotNorznidWeakYesD14RainMildHighStrongNo(Di,DX _,D14|Ru;h|D4D5

26、J)6riO.D14|15“<H【2【"H?#Which attribtae should br tested here?Sslinny = (D1.D2.D8P9D11IGtiin (Silwiy Huinidity) = .970 - (3/5)0.0 - (2/5) 0.0 = .970Gain (Swtfny Tenipcruturr) - .970 - (2/5)0 0 (2/5) I 0 - (1/50.0 = .570加八刃初丫.= .970 - <2/5) 1.0 - (3/5) 918 = 019#Decision Tree Learning Apple

27、tWhich Tree Should We Output? ID3 performs heuristic search through space of decision trees/ It stops at smallest acceptable tree. Why?Occam's razor: prefer the simplest hypothesis that fits the dataWhy Prefer Short Hypotheses? (Occarrfs Razor)Arguments in favor:Arguments opposed:Why Prefer Shor

28、t Hypotheses? (Occarrfs Razor)Argument in favor: Fewer short hypotheses than long onesa short hypothesis that fits the data is less likely to be a statistical coincidencehighly probable that a sufficiently complex hypothesis will fit the dataArgument opposed: Also fewer hypotheses with prime nu mber

29、 of no des and attributes beginning with MZn Whafs so special about Mshortn hypotheses?Overfitting in Decision TreesConsider adding noisy tr<iining example #15:Sunny, Hot、Normal, Strong、PlayTennis = NoWhat effect on earlier tree?NoYesNoYesOverfi ttingConsider error of hypothesis h over training d

30、ata: error entire distribution P of data: errorp(h)Hypothesis h W H overfits training data if there is an alternative hypothesis hf 6 H such thaterrortrain(h) < errortrain()anderrorp(h) > errorpJi!)Overfitting in Decision Tree LearningAvoiding OvcrfittingHow can we avoid over fit ting? stop gr

31、owing when data sj)lit not statistically significant grow full tree, then post-pruneReduced -Error PruningSplit data into training and validation setCreate tiee tliat classifies training set coirectlyDo until further pruning is harmful:1. Evaluate impact on validation set of pruning each possible node (plus those below it)2. Greedily remove the one that most improves validation set accuracyOr produces smallest version of most accurate subtree What if data is limited?Effect of ReducedError PruningO.Q0 83/On tiaining data On te$( data -On test dMi (di« mg pnNiing >

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論