![《人工智能與數(shù)據(jù)挖掘教學(xué)課件》lect113_第1頁](http://file4.renrendoc.com/view/331ace2956e820dd713b6c66d527b214/331ace2956e820dd713b6c66d527b2141.gif)
![《人工智能與數(shù)據(jù)挖掘教學(xué)課件》lect113_第2頁](http://file4.renrendoc.com/view/331ace2956e820dd713b6c66d527b214/331ace2956e820dd713b6c66d527b2142.gif)
![《人工智能與數(shù)據(jù)挖掘教學(xué)課件》lect113_第3頁](http://file4.renrendoc.com/view/331ace2956e820dd713b6c66d527b214/331ace2956e820dd713b6c66d527b2143.gif)
![《人工智能與數(shù)據(jù)挖掘教學(xué)課件》lect113_第4頁](http://file4.renrendoc.com/view/331ace2956e820dd713b6c66d527b214/331ace2956e820dd713b6c66d527b2144.gif)
![《人工智能與數(shù)據(jù)挖掘教學(xué)課件》lect113_第5頁](http://file4.renrendoc.com/view/331ace2956e820dd713b6c66d527b214/331ace2956e820dd713b6c66d527b2145.gif)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、Part IData Mining FundamentalsChapter 1: Data Mining: A First View第1頁,共39頁。2022/8/3BUPT AI&DM2Content1.1 What is Data Mining? Definition1.2 What can computers Learn?1.3 Is Data Mining Appropriate for My Problem?1.4 Expert Systems or Data Mining?1.6 Why Not Simple Search?第2頁,共39頁。2022/8/3BUPT AI&DM31
2、.1 What is data mining: MotivationData explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories. Such amount of data beyond human understanding. We are drowning in data,
3、 but starving for knowledge! Solution: Data warehousing and data miningData warehousing: for data storageData mining: for Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases第3頁,共39頁。2022/8/3BUPT AI&DM41.1 Data Mining is a result of natural ev
4、olution of information technology1960s:Data collection and database creation1970s - early 1980s: Database Management SystemsMid-1980s - present:Data warehouseData analysis and understanding (data mining)第4頁,共39頁。2022/8/3BUPT AI&DM5Data Analysis:New TrendThis is a time that one must speak with data.未
5、來屬于運算師 (Super Crunchers超級運算師, Ian Ayres, 2009):日常決策將變得越來越自動化,人的判斷作用將局限于為計算提供數(shù)據(jù)葡萄酒味道和香味的預(yù)測:奧利.阿申費爾特是普林斯頓大學(xué)的經(jīng)濟學(xué)家,完全不懂葡萄酒的制作,但可以預(yù)測波爾多葡萄酒的價格基于天氣(炎熱、干燥的年份酒會非常好),準(zhǔn)確率高于葡萄酒專家本書原計劃叫“理論的終結(jié)”,后來利用google改書名而不是與出版社編輯討論,因為發(fā)現(xiàn)用此名點擊率高63%放貸員曾經(jīng)收入優(yōu)厚、職責(zé)最大,現(xiàn)在只是呼叫中心的接線員,重復(fù)電腦提示的問題,報酬很低第5頁,共39頁。2022/8/3BUPT AI&DM6Data Analys
6、is:New Trend (cont.)This is a time that one must speak with data.基因測序和新物種:克雷格.文特爾使用能夠分析數(shù)據(jù)的高速計算機,從給單個生物基因排序,2003年開始給海洋測序,2005年給空氣測序。這個過程中發(fā)現(xiàn)了數(shù)千種以前不知道的細(xì)菌和其它生命形式。他對生物學(xué)的推進(jìn)比同輩所有人都大。第6頁,共39頁。2022/8/3BUPT AI&DM7在過去,上海通用保修問題分析主要依靠簡單的純手工處理的計算方式,每次只能產(chǎn)生寥寥幾篇問題報告。盡管汽車生產(chǎn)量遠(yuǎn)不如現(xiàn)在大,但這個耗時費力的分析周期卻在根本上導(dǎo)致了保修成本居高不下。在非自動操作環(huán)
7、境下,從保修索賠出現(xiàn)到找出問題原因平均要花費612個月的時間,且在此間往往還需要借助于通用全球的支持,解決問題的整個過程也主要建立在經(jīng)驗分析的基礎(chǔ)上。另外,不準(zhǔn)確的數(shù)據(jù)導(dǎo)致上海通用難以準(zhǔn)確預(yù)測保修成本,從而合理準(zhǔn)備下一周期的保修預(yù)算,導(dǎo)致大量運營資金被占用、現(xiàn)金流降低。 采用SAS的保修分析解決方案后,上海通用的保修分析周期在頭6個月里就縮短了70%,有效地降低了保修成本,實現(xiàn)了該系統(tǒng)使用的預(yù)期目標(biāo)。同時,這些顯著的改善效果幫助上海通用在短短半年內(nèi)就收回了保修分析系統(tǒng)所有的軟硬件投資,共為公司節(jié)省了1,800萬人民幣的成本。 警察地理信息系統(tǒng)第7頁,共39頁。2022/8/3BUPT AI&D
8、M8Data Mining Definitions(1) The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. (in this text book)(2) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from da
9、ta in large databases. (generally accepted)第8頁,共39頁。2022/8/3BUPT AI&DM9Induction-based Learning(基于歸納的學(xué)習(xí))Data mining methods use induction-based learningThe process of forming general concept definitions by observing specific examples of concepts to be learned.第9頁,共39頁。2022/8/3BUPT AI&DM10 What Is Da
10、ta Mining?Alternative names: Data mining or knowledge mining? Gold mining - poor analogyKnowledge discovery in databases (KDD), business intelligence第10頁,共39頁。2022/8/3BUPT AI&DM11Why Data Mining? Potential Applications (or p4)Database analysis and decision supportMarket analysis and managementtarget
11、 marketing, cross selling, market segmentationRisk analysis and managementForecasting, customer retention, quality controlFraud detection and managementOther ApplicationsText mining (news group, email, documents) and Web analysis.第11頁,共39頁。2022/8/3BUPT AI&DM12Content1.1 What is Data Mining? Definiti
12、on1.2 What can computers Learn?Four Levels of Learning(略)Three Concept Views (略)Supervised LearningUnsupervised Learning1.3 Is Data Mining Appropriate for My Problem?1.4 Expert Systems or Data Mining?1.6 Why Not Simple Search?第12頁,共39頁。2022/8/3BUPT AI&DM131.2.1 Supervised Learning Build a learner mo
13、del using data instances of known origin. Use the model to determine the outcome of new instances of unknown origin.第13頁,共39頁。2022/8/3BUPT AI&DM14Attributes: input attributes, output attributesProcess: Training Data ,Test DataLearning outcome: tree, production rules第14頁,共39頁。2022/8/3BUPT AI&DM15第15頁
14、,共39頁。2022/8/3BUPT AI&DM16Decision tree: A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes (leaf nodes) reflect decision outcomes.root node第16頁,共39頁。2022/8/3BUPT AI&DM17Production Rules (產(chǎn)生式規(guī)則)IF Swollen Glands = Yes THEN Diagnosis = Strep ThroatI
15、F Swollen Glands = No & Fever = Yes THEN Diagnosis = ColdIF Swollen Glands = No & Fever = No THEN Diagnosis = AllergyAntecedent conditions: 先決條件Consequent conditions:結(jié)論第17頁,共39頁。2022/8/3BUPT AI&DM181.2.2 Unsupervised ClusteringA data mining method that builds models from data without predefined clas
16、ses. 第18頁,共39頁。2022/8/3BUPT AI&DM19The Acme Investors Dataset第19頁,共39頁。The Acme Investors Dataset & Supervised Learning Can I develop a general profile of an online investor?Can I determine if a new customer is likely to open a margin account?Can I build a model to accurately predict the average num
17、ber of trades per month for a new investor?What characteristics differentiate female and male investors?第20頁,共39頁。What attribute similarities group customers of Acme Investors together?What differences in attribute values segment the customer database? The Acme Investors Dataset & Unsupervised Clust
18、ering第21頁,共39頁。2022/8/3BUPT AI&DM22IF Margin Account = Yes & Age=20-29 & Annual Income = 40-59kTHEN Cluster = 1accuracy=0.80, coverage=0.50IF Account Type = Custodial & Favorite Recreation = Skiing & Annual Income = 80-90kTHEN Cluster = 2accuracy=0.95, coverage=0.35IF Account Type = Joint & Trades/M
19、onth 5 & Transaction Method = OnlineTHEN Cluster = 3accuracy=0.82, coverage=0.65(see example clusters on p13)第22頁,共39頁。2022/8/3BUPT AI&DM23Content1.1 What is Data Mining? Definition1.2 What can computers Learn?1.3 Is Data Mining Appropriate for My Problem? (Data Mining vs Data Query)1.4 Expert Syste
20、ms or Data Mining?1.6 Why Not Simple Search?第23頁,共39頁。2022/8/3BUPT AI&DM24Data Mining or Data Query? Shallow Knowledge: Shallow knowledge is factual. It can be easily stored and manipulated in a database. Multidimensional Knowledge: Multidimensional knowledge is also factual. On-line analytical Proc
21、essing (OLAP) tools are used to manipulate multidimensional knowledge. Hidden Knowledge: Hidden knowledge represents patterns or regularities in data that cannot be easily found using database query. However, data mining algorithms can find such patterns with ease (example p15). Deep Knowledge: Deep
22、 knowledge is knowledge stored in a database that can only be found if we are given some direction about what we are looking for. 第24頁,共39頁。Data Mining vs. Data Query: An Example (p16) Use data query if you already almost know what you are looking for. Use data mining to find regularities in data th
23、at are not obvious. 第25頁,共39頁。2022/8/3BUPT AI&DM26Content1.1 What is Data Mining? Definition1.2 What can computers Learn?1.3 Is Data Mining Appropriate for My Problem? (Data Mining vs Data Query)1.4 Expert Systems or Data Mining? (Data Mining vs ES)1.6 Why Not Simple Search?第26頁,共39頁。2022/8/3BUPT AI
24、&DM271.4 Expert Systems or Data Mining?Expert System (ES): A computer program that emulates the problem-solving skills of one or more human experts. Used when no (quality) data available, or, in the field where human has good knowledge in it.Experts learn their skills by education and experience.Hum
25、an experts often use rules to describe what they know.ES and DM can work together.第27頁,共39頁。2022/8/3BUPT AI&DM28第28頁,共39頁。2022/8/3BUPT AI&DM29Content1.1 What is Data Mining? Definition1.2 What can computers Learn?1.3 Is Data Mining Appropriate for My Problem? (Data Mining vs Data Query)1.4 Expert Sy
26、stems or Data Mining? (Data Mining vs ES)1.6 Why Not Simple Search? (Data Mining vs Nearest Neighbor Approach)第29頁,共39頁。2022/8/3BUPT AI&DM301.6 Why Not Simple Search?Stores instances or generalized model of the data. Nearest Neighbor ClassifierClassification is performed by searching the training data for the instance closest in distance to the unknown instance.Advantage: suitable for areas where human has limited knowledgeProblem: Slow when number of cases is largeAttribu
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年倉單交易合作伙伴關(guān)系諒解協(xié)議
- 2025年辦公場地續(xù)租合同范本
- 2025年企業(yè)臨時工合同規(guī)范
- 2025年企業(yè)市場調(diào)查外包服務(wù)綜合合同
- 2025年建筑工程設(shè)計合作協(xié)議書樣本
- 2025年公共衛(wèi)生服務(wù)策劃與共識協(xié)議
- 2025年二手車買賣協(xié)議書轉(zhuǎn)讓合同
- 2025年個體工商戶聯(lián)合協(xié)議范本
- 2025年食品質(zhì)量保障與責(zé)任策劃協(xié)議
- 2025年企業(yè)培訓(xùn)策劃合作協(xié)議標(biāo)準(zhǔn)化指南
- 西安2025年陜西西安音樂學(xué)院專任教師招聘20人筆試歷年參考題庫附帶答案詳解
- 廣西壯族自治區(qū)北海市2024-2025學(xué)年九年級上學(xué)期1月期末化學(xué)試題(含答案)
- 2025新人教版英語七年級下單詞表(小學(xué)部分)
- 2025年春季1530安全教育記錄主題
- 川教版2024-2025學(xué)年六年級下冊信息技術(shù)全冊教案
- 2024年新疆(兵團(tuán))公務(wù)員考試《行測》真題及答案解析
- 紅色喜慶中國傳統(tǒng)元宵節(jié)英文介紹教育課件
- 《銀行融資知識》課件
- 新人教版高中數(shù)學(xué)選擇性必修第一冊全套精品課件
- 中式烹調(diào)技藝PPT課件
- 煤礦企業(yè)治安保衛(wèi)工作的難點及對策
評論
0/150
提交評論