第九講 分類與預(yù)測_第1頁
第九講 分類與預(yù)測_第2頁
第九講 分類與預(yù)測_第3頁
第九講 分類與預(yù)測_第4頁
第九講 分類與預(yù)測_第5頁
已閱讀5頁,還剩90頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、第八(d b)講 分類與預(yù)測1廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian classificationRule-based classification(自學(xué)(zxu))Classification by back propagationSupport Vecto

2、r Machines (SVM) (自學(xué)(zxu))Associative classification (自學(xué))Lazy learners (or learning from your neighbors) (自學(xué))Other classification methods (自學(xué))PredictionAccuracy and error measures (自學(xué))Ensemble methods (自學(xué))Model selection (自學(xué))Summary2廈門大學(xué)軟件學(xué)院共九十五頁分類(Classification): 預(yù)測(yc)分類標(biāo)號(離散值或名詞性詞)建立一個模型,基于訓(xùn)練集的數(shù)

3、據(jù)和分類屬性的值(類標(biāo)識)來分類,并在新數(shù)據(jù)中使用該模型。預(yù)測(Prediction): 連續(xù)值函數(shù)上的建模, 例如,預(yù)測未知或缺失的值典型應(yīng)用信用度目標(biāo)市場醫(yī)療診斷分類(fn li) vs. 預(yù)測3廈門大學(xué)軟件學(xué)院共九十五頁分類(fn li)的兩個步驟模型創(chuàng)建: 描述預(yù)定的數(shù)據(jù)類集或概念集。假定每個元組或樣本屬于一個(y )預(yù)定義的類,由一個(y )稱為類標(biāo)號屬性(class label attribute)的屬性確定。用于建模的元組集稱為訓(xùn)練集(training set)模型表示為:分類規(guī)則、判定樹或?qū)傩怨侥P蛻?yīng)用:用于分類未來或未知對象評估模型的準(zhǔn)確率測試樣本的已知標(biāo)號與根據(jù)模型獲得的

4、分類結(jié)果作比較。準(zhǔn)確率定義為正確被模型分類的測試樣本的百分比測試集獨立于訓(xùn)練集,否則,學(xué)習(xí)模型傾向于過分適合(over-fitting)數(shù)據(jù)如果準(zhǔn)確率可被接受,使用模型用于分類類標(biāo)號未知的數(shù)據(jù)。4廈門大學(xué)軟件學(xué)院共九十五頁Classification Process (1): Model ConstructionTrainingDataClassificationAlgorithmsIF rank = professorOR years 6THEN tenured = yes Classifier(Model)5廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Classification P

5、rocess (2): Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff, Professor, 4)Tenured?6廈門大學(xué)軟件(run jin)學(xué)院共九十五頁決策樹算法(sun f)基本算法(貪心算法),樹的構(gòu)建方式:自頂向下遞歸的各個擊破方式樹以代表(dibio)訓(xùn)練樣本的單個節(jié)點開始如果樣本都在同一類中,則節(jié)點為葉子節(jié)點,并用該類標(biāo)記否則,選擇能夠最好地將樣本分類的屬性(稱為測試屬性,必須是離散值的)對測試屬性的每個已知值,創(chuàng)建一個分支,并據(jù)此劃分樣本遞歸形成每個劃分上的樣本決策樹7廈門大學(xué)軟件學(xué)院共九十

6、五頁遞歸劃分步驟僅當(dāng)下列(xili)條件之一成立時立即停止給定節(jié)點的所有樣本屬于同一個類沒有剩余屬性可作進一步劃分樣本 majority voting is employed for classifying the leaf沒有剩余的樣本age?student?credit rating?40noyesyesyes31.40nofairexcellentyesno8廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Attribute Selection Measure: Information Gain (ID3/C4.5)選擇(xunz)具有最高信息增益( information gain )的屬

7、性。pi 是D中任意元組屬于類Ci的概率,用估計|Ci, D|/|D|Expected information (熵:entropy)對D中元組分類所需的期望信息 :Information needed (after using A to split D into v partitions) to classify D:Information gained by branching on attribute A9廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Attribute Selection: Information GainClass P: buys_computer = “yes”Cla

8、ss N: buys_computer = “no” means “age split-point11廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Gain Ratio for Attribute Selection (C4.5)信息增益度量偏向具有(jyu)許多輸出的測試C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain)GainRatio(A) = Gain(A)/SplitInfo(A)Ex.gain_ratio(income) = 0

9、.029/0.926 = 0.031The attribute with the maximum gain ratio is selected as the splitting attribute12廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Gini 指標(biāo)(zhbio) (CART, IBM IntelligentMiner)If a data set D contains examples from n classes, gini index, gini(D) is defined as where pj is the relative frequency of class j in

10、 DIf a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as不純度(chnd)降低:The attribute provides the smallest ginisplit(D) (or the largest reduction in impurity) is chosen to split the node (need to enumerate all the possible splitting points for each attribute)13廈門

11、大學(xué)軟件學(xué)院共九十五頁Gini index (CART, IBM IntelligentMiner)Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”Suppose the attribute income partitions D into 10 in D1: low, medium and 4 in D2but ginimedium,high is 0.30 and thus the best since it is the lowestAll attributes are assumed continuous-valuedM

12、ay need other tools, e.g., clustering, to get the possible split valuesCan be modified for categorical attributes14廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Comparing Attribute Selection MeasuresThe three measures, in general, return good results butInformation gain: biased towards multivalued attributesGain ratio: ten

13、ds to prefer unbalanced splits in which one partition is much smaller than the othersGini index: biased to multivalued attributeshas difficulty when # of classes is largetends to favor tests that result in equal-sized partitions and purity in both partitions15廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Overfitting and

14、 Tree PruningOverfitting:過份擬合(n h) An induced tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliersPoor accuracy for unseen samplesTwo approaches to avoid overfitting Prepruning: 提前停止樹的構(gòu)造,如果劃分一個節(jié)點的元組導(dǎo)致低于預(yù)定義閾值的分裂,則結(jié)束進一步的劃分Difficult to choose an appr

15、opriate thresholdPostpruning:由“完全生長”的樹減去子樹(用樹葉替代對應(yīng)的子樹)Use a set of data different from the training data to decide which is the “best pruned tree”16廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁決策樹歸納(gun)基本算法的加強允許連續(xù)值屬性通過將連續(xù)屬性值劃分到離散值空間,以動態(tài)(dngti)定義新的離散值屬性處理缺失屬性值缺失值賦值以屬性的最常見值缺失值賦值以屬性的最或然值屬性構(gòu)建在給定的屬性上創(chuàng)建新的屬性,改變給定屬性的首先表示,是數(shù)據(jù)變

16、換的一種形式解決決策樹歸納可能面臨的 碎片、復(fù)制和重復(fù)問題。17廈門大學(xué)軟件學(xué)院共九十五頁大型(dxng)數(shù)據(jù)庫中的分類分類問題是統(tǒng)計學(xué)和機器學(xué)習(xí)研究者研究的經(jīng)典難題的一個擴展??缮炜s性:在具有數(shù)以百萬計的樣本和成千上百個屬性(shxng)的數(shù)據(jù)集上,以可被接受的速度作分類為什么在數(shù)據(jù)挖掘中使用決策樹歸納?學(xué)習(xí)速度相當(dāng)快(與其他分類方法比較)分類規(guī)則簡單易懂對數(shù)據(jù)庫的訪問可使用SQL查詢可比較分類的準(zhǔn)確度18廈門大學(xué)軟件學(xué)院共九十五頁可伸縮(shn su)的決策樹歸納方法SLIQ (EDBT96 Mehta et al.)builds an index for each attribute a

17、nd only class list and the current attribute list reside in memorySPRINT (VLDB96 J. Shafer et al.)constructs an attribute list data structure PUBLIC (VLDB98 Rastogi & Shim)integrates tree splitting and tree pruning: stop growing the tree earlierRainForest (VLDB98 Gehrke, Ramakrishnan & Ganti)separat

18、es the scalability aspects from the criteria that determine the quality of the treebuilds an AVC-list (attribute, value, class label)19廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by deci

19、sion tree inductionBayesian classificationRule-based classificationClassification by back propagationSupport Vector Machines (SVM) Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectio

20、nSummary20廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁貝葉斯分類(fn li)貝葉斯分類是統(tǒng)計學(xué)分類方法貝葉斯分類基于(jy)貝葉斯定理樸素貝葉斯分類假定一個屬性值對給定類的影響?yīng)毩⒂谄渌麑傩缘闹?類條件獨立)。貝葉斯信念網(wǎng)絡(luò)能表示屬性子集間的依賴,也可用于分類。21廈門大學(xué)軟件學(xué)院共九十五頁貝葉斯定理(dngl): BasicsX 是類標(biāo)識未知的數(shù)據(jù)樣本(或數(shù)據(jù)元組)如:35歲收入$4000的顧客H 是數(shù)據(jù)樣本X屬于某特定類C的某種假定。如:假設(shè)(jish)顧客將購買計算機P(H/X):條件X下H的后驗概率如:知道顧客年齡與收入時,顧客X將購買計算機的概率P(H):H的先驗概

21、率,即在我們觀察任何樣本前的初始概率,它反應(yīng)了背景知識。如:任意給定的顧客將購買計算機的概率。P(X):被觀察的樣本數(shù)據(jù)的概率如:顧客中年齡35歲收入$4000的概率P(X|H) :條件H下,X的后驗概率如:已知顧客購買計算機,該顧客為35歲收入$4000的概率貝葉斯定理:22廈門大學(xué)軟件學(xué)院共九十五頁Towards Nave Bayesian ClassifierLet D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribu

22、te vector X = (x1, x2, , xn)Suppose there are m classes C1, C2, , Cm.Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)This can be derived from Bayes theoremSince P(X) is constant for all classes, only needs to be maximized23廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Derivation of Nave Baye

23、s Classifier A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):This greatly reduces the computation cost: Only counts the class distributionIf Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci,

24、 D| (# of tuples of Ci in D)If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a mean and standard deviation and P(xk|Ci) is 24廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Nave Bayesian Classifier: Training DatasetClass:C1:buys_computer = yesC2:buys_computer = noData sample X = (ag

25、e =30,Income = medium,Student = yesCredit_rating = Fair)25廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Nave Bayesian Classifier: An ExampleP(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357Compute P(X|Ci) for each class P(age = “=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “= 30” |

26、 buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer

27、 = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age = 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019P(X|Ci)*P(C

28、i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007Therefore, X belongs to class (“buys_computer = yes”)26廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Avoiding the 0-Probability ProblemNave Bayesian prediction requires each conditional prob. be

29、non-zero. Otherwise, the predicted prob. will be zeroEx. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10), Use Laplacian correction (or Laplacian estimator)Adding 1 to each caseProb(income = low) = 1/1003Prob(income = medium) = 991/1003Prob(income = hi

30、gh) = 11/1003The “corrected” prob. estimates are close to their “uncorrected” counterparts27廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Nave Bayesian Classifier: CommentsAdvantages Easy to implement Good results obtained in most of the casesDisadvantagesAssumption: class conditional independence, therefore loss of accura

31、cyPractically, dependencies exist among variables How to deal with these dependencies?Bayesian Belief Networks 28廈門大學(xué)軟件(run jin)學(xué)院共九十五頁貝葉斯網(wǎng)絡(luò)(wnglu)貝葉斯信念網(wǎng)絡(luò)(wnglu)允許在變量的子集間定義類條件的獨立性因果關(guān)系的圖形建模表示了變量間的依賴關(guān)系說明聯(lián)合條件概率分布XYZP節(jié)點: 隨機變量(離散或連續(xù))弧: 概率依賴X,Y are the parents of Z, and Y is the parent of PNo dependency b

32、etween Z and P有向無環(huán)圖29廈門大學(xué)軟件學(xué)院共九十五頁An ExampleFamilyHistoryLungCancerPositiveXRaySmokerEmphysemaDyspneaLCLC(FH, S)(FH, S)(FH, S)(FH, S)0.80.20.50.50.70.30.10.9Bayesian Belief Networks對于每個變量,信念(xnnin)網(wǎng)絡(luò)有一個條件概率表CPT,P(Y|Parent(Y)30廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁訓(xùn)練貝葉斯信念(xnnin)網(wǎng)絡(luò)許多情況都有可能如果網(wǎng)絡(luò)結(jié)構(gòu)已知并且變量是可見的:只需計算CPT(

33、條件概率表)網(wǎng)絡(luò)結(jié)構(gòu)已知,但有些變量隱藏在樣本中: 使用梯度(t d)下降方法訓(xùn)練信念網(wǎng)絡(luò), 類似于神經(jīng)網(wǎng)絡(luò)的學(xué)習(xí)網(wǎng)絡(luò)結(jié)構(gòu)未知,所有變量都可見:從模型空間查找,重新構(gòu)建圖的拓?fù)浣Y(jié)構(gòu)未知網(wǎng)絡(luò)結(jié)構(gòu),所有變量都隱藏:還未有好的算法31廈門大學(xué)軟件學(xué)院共九十五頁Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian class

34、ificationRule-based classification(自學(xué)(zxu))Classification by back propagationSupport Vector Machines (SVM) Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary32廈門大學(xué)軟件(run jin

35、)學(xué)院共九十五頁Using IF-THEN Rules for Classification以IF-THEN規(guī)則的形式來表示知識R: IF age = youth AND student = yes THEN buys_computer = yes規(guī)則前件或前提 vs.規(guī)則結(jié)論規(guī)則的覆蓋率與準(zhǔn)確率ncovers = # of tuples covered by R(規(guī)則前件被滿足)ncorrect = # of tuples correctly classified by Rcoverage(R) = ncovers /|D| /* D: training data set */accurac

36、y(R) = ncorrect / ncovers如果元組被多條規(guī)則所“觸發(fā)(chf)”,則需要解決沖突規(guī)模序(Size ordering): assign the highest priority to the triggering rules that has the “toughest” requirement (i.e., with the most attribute test)基于類的序(Class-based ordering): 類按重要性遞減排序規(guī)則序(Rule-based ordering (decision list)): rules are organized into

37、 one long priority list, according to some measure of rule quality or by experts33廈門大學(xué)軟件(run jin)學(xué)院共九十五頁age?student?credit rating?40noyesyesyes31.40nofairexcellentyesno從決策樹中提取(tq)規(guī)則以IF-THEN規(guī)則的形式來表示知識對從根到樹葉的每一條路徑創(chuàng)建一條規(guī)則。沿著給定路徑上的每個屬性值對,形成規(guī)則前件(“IF”部分)葉子節(jié)點(ji din)包含類預(yù)測,形成規(guī)則后件(“THEN”部分)IF-THEN規(guī)則易于理解。Examp

38、leIF age = “=30” AND student = “no” THEN buys_computer = “no”IF age = “40” AND credit_rating = “excellent” THEN buys_computer = “yes”IF age = “ credit approval ( Yes/No)Temp, Humidity - Rain (Yes/No)分類(fn li)的數(shù)學(xué)映射數(shù)學(xué)方式:39廈門大學(xué)軟件學(xué)院共九十五頁線性分類(fn li)二進制分類問題紅線上方的數(shù)據(jù)(shj)屬于類x 紅線下方的數(shù)據(jù)屬于類 o例子 SVM, Perceptron,

39、Probabilistic Classifiersxxxxxxxxxxooooooooooooo40廈門大學(xué)軟件學(xué)院共九十五頁Discriminative Classifiers優(yōu)點預(yù)測準(zhǔn)確率一般更高(與貝葉斯方法比較)對在樣本包含錯誤信息的數(shù)據(jù)集上工作時,具有較強的魯棒性快速(kui s)評估被學(xué)習(xí)的目標(biāo)函數(shù)(貝葉斯網(wǎng)絡(luò)通常較慢)限制漫長的訓(xùn)練時間網(wǎng)絡(luò)構(gòu)建和訓(xùn)練過程中,參數(shù)設(shè)置多,很難確定可理解性差41廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Classification by Backpropagation后向傳播(Backpropagation):一種神經(jīng)網(wǎng)絡(luò)(neural netwo

40、rk)學(xué)習(xí)算法最早由心理學(xué)家和神經(jīng)學(xué)家開創(chuàng),旨在尋求開發(fā)和測試神經(jīng)的計算模擬神經(jīng)網(wǎng)絡(luò):一組連接的輸入輸出單元,每個連接都與一個權(quán)重(weight)關(guān)聯(lián)在學(xué)習(xí)過程中,通過(tnggu)調(diào)整權(quán)重,能預(yù)測輸入元組的正確類標(biāo)號又稱“連接者學(xué)習(xí)”( connectionist learning)42廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Neural Network as a ClassifierWeaknessLong training time Require a number of parameters typically best determined empirically, e.g.,

41、the network topology or structure. Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of hidden units in the networkStrengthHigh tolerance to noisy data Ability to classify untrained patterns Well-suited for continuous-valued inputs and outputsSuccessfu

42、l on a wide array of real-world dataAlgorithms are inherently parallelTechniques have recently been developed for the extraction of rules from trained neural networks43廈門大學(xué)軟件(run jin)學(xué)院共九十五頁神經(jīng)元(感知器)The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlin

43、ear function mappingmk-fweighted sumInputvector xoutput yActivationfunctionweightvector ww0w1wnx0 x1xn44廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁A Multi-Layer Feed-Forward Neural Network Output layerInput layerHidden layerOutput vectorInput vector: Xwij45廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁How A Multi-Layer Neural Network Wor

44、ks?網(wǎng)絡(luò)的輸入對應(yīng)于每個訓(xùn)練元組測量的屬性 輸入同時(tngsh)提供給稱作輸入層(input layer)的單元層。輸入層同時加權(quán)提供給隱藏層(hidden layer)隱藏層的數(shù)量是任意的,實踐中通常只有一層最后一個隱藏層的加權(quán)輸出作為輸出層的單元的輸入,輸出層發(fā)布給定元組的網(wǎng)絡(luò)預(yù)測。前饋網(wǎng)絡(luò):權(quán)重都不回送到輸入單元或前一層的輸出單元的網(wǎng)絡(luò)從統(tǒng)計學(xué)觀點來看,神經(jīng)網(wǎng)絡(luò)進行的時非線性回歸。給定足夠多的隱藏單元和足夠的訓(xùn)練樣本,多層前饋網(wǎng)絡(luò)可以逼近任何函數(shù)。46廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Defining a Network TopologyFirst decide th

45、e network topology: # of units in the input layer, # of hidden layers (if 1), # of units in each hidden layer, and # of units in the output layerNormalizing the input values for each attribute measured in the training tuples to 0.01.0One input unit per domain value, each initialized to 0Output, if f

46、or classification and more than two classes, one output unit per class is usedOnce a network has been trained and its accuracy is unacceptable, repeat the training process with a different network topology or a different set of initial weights47廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁Backpropagation迭代地處理訓(xùn)練元組數(shù)據(jù)集,將每

47、個元組的網(wǎng)絡(luò)預(yù)測與實際已知的目標(biāo)值做比較對于每個訓(xùn)練樣本,修改權(quán)重使網(wǎng)絡(luò)預(yù)測和實際目標(biāo)間的均方誤差最小修改“后向”進行(jnxng):由輸出層經(jīng)由每個隱藏藏,到第一個隱藏層StepsInitialize weights (to small random #s) and biases in the networkPropagate the inputs forward (by applying activation function) Backpropagate the error (by updating weights and biases)Terminating condition (wh

48、en error is very small, etc.)48廈門大學(xué)軟件(run jin)學(xué)院共九十五頁49廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian classificationRule-based classificationClassification b

49、y back propagationSupport Vector Machines (SVM) (自學(xué)(zxu))Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary50廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁SVMSupport Vector Machines線性和非線性數(shù)據(jù)的新分類方

50、法使用非線性映射,將原訓(xùn)練數(shù)據(jù)映射到較高的維在新的維上,它搜索線性最佳分離超平面(決策邊界) 使用一個適當(dāng)?shù)淖銐?zgu)高維的非線性映射,兩類數(shù)據(jù)總可以被超平面所分開。SVM 使用支持向量 (“essential” training tuples) 和邊緣 (defined by the support vectors)發(fā)現(xiàn)超平面51廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMHistory and ApplicationsVapnik and colleagues (1992)groundwork from Vapnik & Chervonenkis statistical lear

51、ning theory in 1960sFeatures: training can be slow but accuracy is high owing to their ability to model complex nonlinear decision boundaries (margin maximization)Used both for classification and predictionApplications: handwritten digit recognition, object recognition, speaker identification, bench

52、marking time-series prediction tests 52廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMGeneral PhilosophySupport VectorsSmall MarginLarge Margin53廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMMargins and Support Vectors54廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMWhen Data Is Linearly SeparablemLet data D be (X1, y1), , (X|D|, y|D|), where Xi is the set of tra

53、ining tuples associated with the class labels yiThere are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one that minimizes classification error on unseen data)SVM searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (M

54、MH)55廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMLinearly SeparableA separating hyperplane can be written asW X + b = 0where W=w1, w2, , wn is a weight vector and b a scalar (bias)For 2-D it can be written asw0 + w1 x1 + w2 x2 = 0The hyperplane defining the sides of the margin: H1: w0 + w1 x1 + w2 x2 1 for yi = +1, an

55、dH2: w0 + w1 x1 + w2 x2 1 for yi = 1Any training tuples that fall on hyperplanes H1 or H2 (i.e., the sides defining the margin) are support vectors56廈門大學(xué)軟件(run jin)學(xué)院共九十五頁SVMLinearly InseparableTransform the original input data into a higher dimensional spaceSearch for a linear separating hyperplane

56、 in the new space57廈門大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁SVMKernel functionsInstead of computing the dot product on the transformed data tuples, it is mathematically equivalent to instead applying a kernel function K(Xi, Xj) to the original data, i.e., K(Xi, Xj) = (Xi) (Xj) Typical Kernel FunctionsSVM can also b

57、e used for classifying multiple ( 2) classes and for regression analysis (with additional user parameters)58廈門大學(xué)軟件(run jin)學(xué)院共九十五頁Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian cl

58、assificationRule-based classificationClassification by back propagationSupport Vector Machines (SVM) Associative classification (自學(xué)(zxu)) Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary59廈門大學(xué)(sh mn

59、 d xu)軟件學(xué)院共九十五頁Associative ClassificationAssociative classification關(guān)聯(lián)規(guī)則(guz)的產(chǎn)生和分析旨在用于分類搜索頻繁模式與類標(biāo)號之間的強關(guān)聯(lián)規(guī)則。Classification: Based on evaluating a set of rules in the form of P1 p2 pl “Aclass = C” (conf, sup)Why effective? 關(guān)聯(lián)規(guī)則考察了多屬性間的高置信度關(guān)聯(lián),可以克服決策樹歸納一次只能考慮一個屬性的局限性。關(guān)聯(lián)分類比傳統(tǒng)的分類方法更準(zhǔn)確60廈門大學(xué)軟件(run jin)學(xué)院共九

60、十五頁Typical Associative Classification MethodsCBA (Classification By Association: Liu, Hsu & Ma, KDD98)Mine association possible rules in the form ofCond-set (a set of attribute-value pairs) class labelBuild classifier: Organize rules according to decreasing precedence based on confidence and then su

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論