下載本文檔
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、 Training, Validation and Test DataExample:(A)We have data on 16 data items , their attributes and class labels.RANDOMLY divide them into 8 for training, 4 for validation and 4 for testing.Training Item No. d Attributes Class1.02.03.KNOWN FOR ALL14.15.DATA ITEMS16.17.08.0Validation 9.010.011.112.0Te
2、st 13.014.015.116.1(B). Next, suppose we develop, three classification models A, B, C from the training data. Let the training errors on these models be as shown below (recall that the models do not necessarily provide perfect results on training dataneither they are required to). Classification res
3、ults fromItem No.d- AttributesTrue Class Model A Model BModel C 1.00112.ALL KNOWN00003.10104.11015.10006.11117.00008.0000Classification Error2/83/83/8 (C). Next, use the three models A, B, C to classify each item in the validation set based on its attribute vales. Recall that we do know their true l
4、abels as well. Suppose we get the following results: Classification results fromItem No.d- AttributesTrue Class Model A Model BModel C 9.010010.001011.101012.0010Classification Error2/42/41/4If we use minimum validation error as model selection criterion, we would select model C.(D). Now use model C
5、 to determine class values for each data point in the test set. We do so by substituting the (known) attribute value into the classification model C. Again, recall that we know the true label of each of these data items so that we can compare the values obtained from the classification model with th
6、e true labels to determine classification error on the test set. Suppose we get the following results.Classification results from Item No.d- AttributesTrue ClassModel C13.0014. ALL KNOWN0015.1016.11Classification Error1/4(E). Based on the above, an estimate of generalization error is 25%. What this
7、means is that if we use Model C to classify future items for which only the attributes will be known, not the class labels, we are likely to make incorrect classifications about 25% of the time.(F). A summary of the above is as follows:ModelTrainingValidation Test A2550 -B37.550-C37.52525 Cross Vali
8、dationIf available data are limited, we employ Cross Validation (CV). In this approach, data are randomly divided into almost k equal sets. Training is done based on (k-1) sets and the k-th set is used for test. This process is repeated k times (k-fold CV). The average error on the k repetitions is
9、used as a measure of the test error.For the special case when k=1, the above is called Leave- One Out-Cross-Validation (LOO-CV).EXAMPLE: Consider the above data consisting of 16 items.(A). Let k= 4, i.e., 4- fold Cross Validation. Divide the data into four sets of 4 items each.Suppose the following
10、set up occurs and the errors obtained are as shown.Set 1 Set 2 Set 3 Set 4Training Items 1 - 12Items 1 - 813-16Items 1 - 49-16Items 5-16Test Items 13-16Items 9-12Items 5 - 8Items 1 4 Error on test set (assume)25%35%28%32%Estimated Classification Error (CE) = 25+35+28+32 = 30% 4(B). LOO CV For this, data are divided into 16 sets, each consisting of 15 training data and one test data. Set 1 Set 2 Set 15Set 16Training Items 1 - 15Items 1 14,16Item 1,3-8Items 2-16Test Item 16Item 15Item 2Ite
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 安徽省阜陽(yáng)四中、阜南二中、阜南實(shí)驗(yàn)中學(xué)三校2022年物理高一下期末經(jīng)典試題含解析
- 2022年重慶第二外國(guó)語(yǔ)學(xué)校高物理高一下期末統(tǒng)考模擬試題含解析
- 2022年云南省宣威市第十中學(xué)物理高一下期末調(diào)研模擬試題含解析
- 2022年西藏自治區(qū)拉薩市八校物理高一下期末調(diào)研模擬試題含解析
- 機(jī)械設(shè)計(jì)基礎(chǔ) 第4版 z專業(yè)詞匯
- 2024年酒店業(yè)民宿項(xiàng)目提案報(bào)告
- 2024年補(bǔ)鈣保健品項(xiàng)目規(guī)劃申請(qǐng)報(bào)告
- 2024年硫酸黏菌素類產(chǎn)品項(xiàng)目申請(qǐng)報(bào)告
- 關(guān)于大學(xué)生自行車失竊問(wèn)題的調(diào)查
- 2024年選礦設(shè)備:破碎設(shè)備項(xiàng)目立項(xiàng)申請(qǐng)報(bào)告模板
- 行政復(fù)議法-形考作業(yè)1-國(guó)開(kāi)(ZJ)-參考資料
- 軍事理論-國(guó)家安全環(huán)境強(qiáng)化版-知到答案、智慧樹(shù)答案
- 中國(guó)軍事武器
- 八年級(jí)語(yǔ)文(完整版)標(biāo)點(diǎn)符號(hào)及使用練習(xí)題及答案
- 《保護(hù)文化遺產(chǎn)》參考課件
- 利用信息技術(shù)減輕學(xué)生課業(yè)負(fù)擔(dān)課題研究總結(jié)
- 施工環(huán)境保護(hù)培訓(xùn)課件
- 《院前急救》課件
- 公共關(guān)系活動(dòng)效果的評(píng)估
- 清收清欠管理辦法3篇
- 【城市軌道交通應(yīng)急疏散探究:以重慶軌道交通為例11000字(論文)】
評(píng)論
0/150
提交評(píng)論