UCI數(shù)據(jù)庫(kù)使用說(shuō)明_第1頁(yè)
UCI數(shù)據(jù)庫(kù)使用說(shuō)明_第2頁(yè)
UCI數(shù)據(jù)庫(kù)使用說(shuō)明_第3頁(yè)
UCI數(shù)據(jù)庫(kù)使用說(shuō)明_第4頁(yè)
UCI數(shù)據(jù)庫(kù)使用說(shuō)明_第5頁(yè)
已閱讀5頁(yè),還剩1頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、UCI機(jī)器學(xué)習(xí)數(shù)據(jù)庫(kù)使用說(shuō)明(轉(zhuǎn))2011-04-25 14:40UCI機(jī)器學(xué)習(xí)數(shù)據(jù)庫(kù)的網(wǎng)址: /ml/數(shù)據(jù)庫(kù)不斷更新至2010年,是所有學(xué)習(xí)人工智能都需要用到的數(shù)據(jù)庫(kù),是看文章、寫論文、測(cè)試算法的必備工具。數(shù)據(jù)庫(kù)種類涉及生活、工程、科學(xué)各個(gè)領(lǐng)域,記錄數(shù)也是從少到多,最多達(dá)幾十萬(wàn)條。UCI數(shù)據(jù)可以使用matlab的dlmread或textread讀取,不過(guò),需要先將不是數(shù)字的類別用數(shù)字,比如1/2/3等替換,否則讀入不了數(shù)值,當(dāng)字符了。UCI數(shù)據(jù)庫(kù)使用說(shuō)明轉(zhuǎn)自:/bbs/thread-37-1-1.html此目錄包含數(shù)據(jù)集和相關(guān)領(lǐng)域知識(shí)(后面以簡(jiǎn)短的列表形式進(jìn)行的注釋),這些數(shù)據(jù)已經(jīng)或能用于評(píng)

2、價(jià)學(xué)習(xí) 算法 。每個(gè)數(shù)據(jù)文件 (*.data)包含以“屬性-值”對(duì)形式描述的很多個(gè)體樣本的記錄。對(duì)應(yīng)的*.info文件包含的大量的文檔資料 。(有些文件_generate_ databases;他們不包含*.data文件。)作為數(shù)據(jù)集和領(lǐng)域知識(shí)的補(bǔ)充,在utilities目錄里包含了一些在使用這一數(shù)據(jù)集時(shí)的有用資料。地址 HYPERLINK /mlearn/MLRepository.html/mlearn/MLRepository.html ,這里的UCI數(shù)據(jù)集可以看作是通過(guò)web的遠(yuǎn)程拷貝。作為選擇,這些數(shù)據(jù)同樣可以通過(guò)ftp獲得,ftp:/HYPERLINK / . 可是使用匿名登陸ftp

3、??梢栽趐ub/machine-learning-databases目錄中找到。注意:UCI一直都在尋找可加入的新數(shù)據(jù),這些數(shù)據(jù)將被寫入incoming子目錄中。希望您能貢獻(xiàn)您的數(shù)據(jù),并提供相應(yīng)的文檔。謝謝貢獻(xiàn)過(guò)程可以參考DOC-REQUIREMENTS文件。目前,多數(shù)數(shù)據(jù)使用下面的格式 :一個(gè)實(shí)例一行,沒(méi)有空格,屬性值之間使用逗號(hào)“,”隔開(kāi),并且缺少的值使用問(wèn)號(hào)“?”表示。并請(qǐng)?jiān)谧龀瞿呢暙I(xiàn)后提醒一下站點(diǎn)管理員:HYPERLINK mailto:ml-repositoryml-repository 下面以UCI中IRIS為例介紹一下數(shù)據(jù)集:ucidatairis中有三個(gè)文件:Indexiri

4、s.datasindex為文件夾目錄,列出了本文件夾里的所有文件,如iris中index的內(nèi)容如下:Index of iris18 Mar 1996 105 Index08 Mar 1993 4551 iris.data30 May 1989 2604 siris.data為iris數(shù)據(jù)文件,內(nèi)容如下:5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa7.0,3.2,4.7,1.4,Iris-versicolor6.4,3.2,4.5,1.5,Iris-versicolor6.9,3

5、.1,4.9,1.5,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica7.1,3.0,5.9,2.1,Iris-virginica如上,屬性直接以逗號(hào)隔開(kāi),中間沒(méi)有空格(5.1,3.5,1.4,0.2,),最后一列為本行屬性對(duì)應(yīng)的值,即決策屬性Iris-setosa。s介紹了irir數(shù)據(jù)的一些相關(guān)信息,如數(shù)據(jù)標(biāo)題、數(shù)據(jù)來(lái)源、以前使用情況、最近信息、實(shí)例數(shù)目、實(shí)例的屬性等,如下所示部分:7. Attribute Information: 1. sepal length in cm 2. sepa

6、l width in cm 3. petal length in cm 4. petal width in cm 5. class: - Iris Setosa - Iris Versicolour - Iris Virginica9. Class Distribution: 33.3% for each of 3 classes.本數(shù)據(jù)的使用實(shí)例請(qǐng)參考其他論文,或本站后面的內(nèi)容。對(duì)應(yīng)的英文有:This is the UCI Repository Of Machine Learning Databases and Domain Theories= This is the UCI Reposit

7、ory Of Machine Learning Databases and Domain Theories 4 December 1995 : pub/machine-learning-databases HYPERLINK /mlearn/MLRepository.html/mlearn/MLRepository.html Librarian: Patrick M. Murphy (HYPERLINK mailto:ml-repositoryml-repository ) 111 databases and domain theories (36MB) =This directory con

8、tains data sets and domain theories (the latter have beenannotated as such in the following brief listing) that have been or can beused to evaluate learning algorithms. Each data file (*.data) containsindividual records described in terms of attribute-value pairs. Thecorresponding *.info file contai

9、ns voluminous documentation. (Some files_generate_ databases; they do not have *.data files.)In addition to data sets and domain theories, the utilities/ directorycontains utilities that you may find useful when using datasets in thisrepository.The contents of this repository can be viewed and remot

10、ely copied overthe web. The address is HYPERLINK /mlearn/MLRepository.html./mlearn/MLRepository.html. Alternatively, the contents of this repository can be remotely copied via ftp to . Enter anonymous for user id, and e-mail address (email=userhostuserhost/email) for password. These databases can be

11、 found by executing cd pub/machine-learning-databases.Notes:1. Were always looking for addition al databases, which can be written to the sub-directory named /incoming. Please send yours, with documentation. Thanks - See DOC-REQUIREMENTS for suggested documentation procedures. Presently, most databa

12、ses have the following format: 1 instance per line, no spaces, commas separate attribute values, and missing values are denoted by ?. Also, please notify the site librarian (HYPERLINK mailto:ml-repositoryml-repository ) after making a donation.2. Ivan Bratko requested that the databases he donated f

13、rom the Ljubljana Oncology Institute (e.g., breast-cancer, lymphography, and primary-tumor) have restricted access. We are allowed to share them with academic institutions upon request. These databases (like several others) require providing proper citations be made in published articles that use th

14、em. Citation requirements are in each databases corresponding *.doc file. To access any of these databases, send email to HYPERLINK mailto:ml-repositoryml-repository . To aid you in deciding if you want any of these databases, the documentation files are available.3. An archive server may now be use

15、d to recieve via e-mail files in this repository. Installed on ics, it provides email access to files in our anonymous ftp/uucp area (ftp). If people have no other access to our archives, then they can send mail to:HYPERLINK mailto:archive-serverarchive-server Commands to the server may be given in

16、the body. Some commands are:helpsend find The help command replies with a useful help message.If you publish material based on databases obtained from this repository,then, in your acknowledgements, please note the assistance you received byusing this repository. Thanks - this will help others to ob

17、tain the samedata sets and replicate your experiments. We suggest the following pseudo-APAreference format for referring to this repository (LaTeXd): Murphy,P.M., & Aha,D.W. (1994). it UCI Repository of machine learning databases HYPERLINK /mlearn/MLRepository.html/mlearn/MLRepository.html . Irvine,

18、 CA: University of California, Department of Information and Computer Science.Patrick M. Murphy (Repository Librarian) Brief Overview of Databases and Domain Theories:Quick Listing:1. annealing (David Sterling and Wray Buntine)2. Artificial Characters Database & DT (donated by Attilio Giordana)3-4.

19、audiology (Ray Bareiss and Bruce Porter, used in Protos) 1. Original Version 2. Standardized-Attribute Version of the Original.5. auto-mpg (from CMU StatLib library)6. autos (Jeff Schlimmer)7. badges (Haym Hirsh)8. balance-scale (Tim Hume)9. balloons (Michael Pazzani)10. breast-cancer (Ljubljana Ins

20、titute of Ontcology, restricted access)11. breast-cancer-wisconsin (Wisconsin Breast Cancer Dbase, Olvi Mangasarian) 1. Original version 2. Diagnostic data set 3. Prognostic data set12. bridges (Yoram Reich)13-21. chess 1. Partial generator of Quinlans chess-end-game data (kr-vs-kn) (Schlimmer) 2. S

21、hapiros endgame database (kr-vs-kp) (Rob Holte) 3. king-rook-vs-king (Michael Bain, Arthur van Hoff) 4-9. Six domain theories (Nick Flann)22. Bach Chorales (time-series) database (Darrell Conklin)23. Connect-4 Database (John Tromp)24-25. Credit Screening Database 1. Japanese Credit Screening Data an

22、d domain theory (Chiharu Sano) 2. Credit Card Application Approval Database (Ross Quinlan)26. Ein-Dor and Feldmessers cpu-performance database (David Aha)27. Diabetes Data (Serdar Uckun, AI-M94)28. dgp-2 data generation program (Powell Benedict)29. Document Understanding (Donato Malerba)30. Nine sma

23、ll EBL domain theories and examples in sub-directory ebl31. Evlin Kinneys echocardiogram database (Steven Salzberg)32. flags (Richard Forsyth)33. function-finding (Cullen Schafers 352 case studies)34. glass (Vina Spiehler)35. hayes-roth (from Hayes-Roth2s paper)36-39. heart-disease (Robert Detrano)4

24、0. hepatitis (G. Gong)41. horse colic database (Mary McLeish & Matt Cecile)42. (Boston) Housing database (from CMU StatLib library)43. ICU data (Serdar Uckun, AIM-94)44. Image segmentation database (Carla Brodley)45. ionosphere information (Vince Sigillito) 46. iris (R.A. Fisher, 1936)47. isolet (Ro

25、n Cole and Mark Fantys database donated by Tom Dietterich)48. kinship (J. Ross Quinlan)49. labor-negotiations (Stan Matwin)50-51. led-display-creator (from the CART book)52. lenses (Cendrowskas database donated by Benoit Julien)53. letter-recognition database (created and donated by David Slate)54.

26、liver-disorders (BUPA Medicals database donated by Richard Forsyth)55. logic-theorist (Paul ORorke)56. lung cancer (Stefan Aeberhard)57. lymphography (Ljubjana Institute of Oncology, restricted access)58-59. mechanical-analysis (Francesco Bergadano) 1. Original Mechanical Analysis Data Set 2. PUMPS

27、DATA SET60 mobile robots (donated by Klingspor, Morik and Rieger)61-64. molecular-biology 1. promoter sequences (Towell, Shavlik, & Noordewier, domain theory also) 2. splice-junction sequences (Towell, Noordewier, & Shavlik, domain theory also) 3. protein secondary structure database (Qian and Sejno

28、wski) 4. protein secondary structure domain theory (Jude Shavlik & Rich Maclin)65. MONKs Problems (donated by Sebastian Thrun)66. Moral Reasoner Database (donated by James Wogulis)67. mushroom (Jeff Schlimmer)68. MUSK databases (2) (donated by Tom Dietterich)69. othello domain theory (Tom Fawcett)70

29、. Page Blocks Classification (Donato Malerba)71. Pima Indians diabetes diagnoses (Vince Sigillito) 72. Postoperative Patient data (Jerzy W. Grzymala-Busse)73. Primary Tumor (Ljubjana Institute of Oncology, restricted access)74. Qualitative Structure Activity Relationships (QSARs) (Ross King)75. Quad

30、raped Animals (John H. Gennari)76. Servo data (Ross Quinlan)77. shuttle-landing-control (Bojan Cestnik)78. solar flare (Gary Bradshaw)79-80. soybean (from Ryszard Michalskis groups)81. space shuttle databases (David Draper)82. spectrometer (Infra-Red Astronomy Satellite Project Database, John Stutz)83. Sponge Database (Iosune Uriz and Marta Domingo)84. Statlog Project databases (7) (from Ross King,.)85 Student Loan re

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論