版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
31December2022DataMining:ConceptsandTechniques1DataMining:
ConceptsandTechniques
—Chapter6—JiaweiHanDepartmentofComputerScienceUniversityofIllinoisatUrbana-C/~hanj?2006JiaweiHanandMichelineKamber,Allrightsreserved31December2022DataMining:ConceptsandTechniques2Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques3Classification
predictscategoricalclasslabels(discreteornominal)classifiesdata(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdataPredictionmodelscontinuous-valuedfunctions,i.e.,predictsunknownormissingvaluesTypicalapplicationsCreditapprovalTargetmarketingMedicaldiagnosisFrauddetectionClassificationvs.Prediction31December2022DataMining:ConceptsandTechniques4Classification—ATwo-StepProcess
Modelconstruction:describingasetofpredeterminedclassesEachtuple/sampleisassumedtobelongtoapredefinedclass,asdeterminedbytheclasslabelattributeThesetoftuplesusedformodelconstructionistrainingsetThemodelisrepresentedasclassificationrules,decisiontrees,ormathematicalformulaeModelusage:forclassifyingfutureorunknownobjectsEstimateaccuracyofthemodelTheknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodelAccuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodelTestsetisindependentoftrainingset,otherwiseover-fittingwilloccurIftheaccuracyisacceptable,usethemodeltoclassifydatatupleswhoseclasslabelsarenotknown31December2022DataMining:ConceptsandTechniques5Process(1):ModelConstructionTrainingDataClassificationAlgorithmsIFrank=‘professor’ORyears>6THENtenured=‘yes’Classifier(Model)31December2022DataMining:ConceptsandTechniques6Process(2):UsingtheModelinPrediction
ClassifierTestingDataUnseenData(Jeff,Professor,4)Tenured?31December2022DataMining:ConceptsandTechniques7Supervisedvs.UnsupervisedLearningSupervisedlearning(classification)Supervision:Thetrainingdata(observations,measurements,etc.)areaccompaniedbylabelsindicatingtheclassoftheobservationsNewdataisclassifiedbasedonthetrainingsetUnsupervisedlearning
(clustering)TheclasslabelsoftrainingdataisunknownGivenasetofmeasurements,observations,etc.withtheaimofestablishingtheexistenceofclassesorclustersinthedata31December2022DataMining:ConceptsandTechniques8Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques9Issues:DataPreparationDatacleaningPreprocessdatainordertoreducenoiseandhandlemissingvaluesRelevanceanalysis(featureselection)RemovetheirrelevantorredundantattributesDatatransformationGeneralizeand/ornormalizedata31December2022DataMining:ConceptsandTechniques10Issues:EvaluatingClassificationMethodsAccuracyclassifieraccuracy:predictingclasslabelpredictoraccuracy:guessingvalueofpredictedattributesSpeedtimetoconstructthemodel(trainingtime)timetousethemodel(classification/predictiontime)Robustness:handlingnoiseandmissingvaluesScalability:efficiencyindisk-residentdatabasesInterpretabilityunderstandingandinsightprovidedbythemodelOthermeasures,e.g.,goodnessofrules,suchasdecisiontreesizeorcompactnessofclassificationrules31December2022DataMining:ConceptsandTechniques11Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques12DecisionTreeInduction:TrainingDatasetThisfollowsanexampleofQuinlan’sID3(PlayingTennis)31December2022DataMining:ConceptsandTechniques13Output:ADecisionTreefor“buys_computer〞age?overcaststudent?creditrating?<=30>40noyesyesyes31..40nofairexcellentyesno31December2022DataMining:ConceptsandTechniques14AlgorithmforDecisionTreeInductionBasicalgorithm(agreedyalgorithm)Treeisconstructedinatop-downrecursivedivide-and-conquermannerAtstart,allthetrainingexamplesareattherootAttributesarecategorical(ifcontinuous-valued,theyarediscretedinadvance)SamplesarepartitionedrecursivelybasedonselectedattributesTestattributesareselectedonthebasisofaheuristicorstatisticalmeasure(e.g.,informationgain)ConditionsforstoppingpartitioningAllsamplesforagivennodebelongtothesameclassTherearenoremainingattributesforfurtherpartitioning–majorityvotingisemployedforclassifyingtheleafTherearenosamplesleft31December2022DataMining:ConceptsandTechniques15AttributeSelectionMeasure:InformationGain(ID3/C4.5)SelecttheattributewiththehighestinformationgainLetpibetheprobabilitythatanarbitrarytupleinDbelongstoclassCi,estimatedby|Ci,D|/|D|Expectedinformation(entropy)neededtoclassifyatupleinD:Informationneeded(afterusingAtosplitDintovpartitions)toclassifyD:InformationgainedbybranchingonattributeA31December2022DataMining:ConceptsandTechniques16AttributeSelection:InformationGainClassP:buys_computer=“yes〞ClassN:buys_computer=“no〞means“age<=30〞has5outof14samples,with2yes’esand3no’s.HenceSimilarly,31December2022DataMining:ConceptsandTechniques17ComputingInformation-GainforContinuous-ValueAttributesLetattributeAbeacontinuous-valuedattributeMustdeterminethebestsplitpointforASortthevalueAinincreasingorderTypically,themidpointbetweeneachpairofadjacentvaluesisconsideredasapossiblesplitpoint(ai+ai+1)/2isthemidpointbetweenthevaluesofaiandai+1ThepointwiththeminimumexpectedinformationrequirementforAisselectedasthesplit-pointforASplit:D1isthesetoftuplesinDsatisfyingA≤split-point,andD2isthesetoftuplesinDsatisfyingA>split-point31December2022DataMining:ConceptsandTechniques18GainRatioforAttributeSelection(C4.5)InformationgainmeasureisbiasedtowardsattributeswithalargenumberofvaluesC4.5(asuccessorofID3)usesgainratiotoovercometheproblem(normalizationtoinformationgain)GainRatio(A)=Gain(A)/SplitInfo(A)Ex.Theattributewiththemaximumgainratioisselectedasthesplittingattribute31December2022DataMining:ConceptsandTechniques19Giniindex(CART,IBMIntelligentMiner)IfadatasetDcontainsexamplesfromnclasses,giniindex,gini(D)isdefinedas
wherepjistherelativefrequencyofclassjinDIfadatasetDissplitonAintotwosubsetsD1andD2,theginiindexgini(D)isdefinedasReductioninImpurity:Theattributeprovidesthesmallestginisplit(D)(orthelargestreductioninimpurity)ischosentosplitthenode(needtoenumerateallthepossiblesplittingpointsforeachattribute)31December2022DataMining:ConceptsandTechniques20Giniindex(CART,IBMIntelligentMiner)Ex.Dhas9tuplesinbuys_computer=“yes〞and5in“no〞SupposetheattributeincomepartitionsDinto10inD1:{low,medium}and4inD2butgini{medium,high}is0.30andthusthebestsinceitisthelowestAllattributesareassumedcontinuous-valuedMayneedothertools,e.g.,clustering,togetthepossiblesplitvaluesCanbemodifiedforcategoricalattributes31December2022DataMining:ConceptsandTechniques21ComparingAttributeSelectionMeasuresThethreemeasures,ingeneral,returngoodresultsbutInformationgain:biasedtowardsmultivaluedattributesGainratio:tendstopreferunbalancedsplitsinwhichonepartitionismuchsmallerthantheothersGiniindex:biasedtomultivaluedattributeshasdifficultywhen#ofclassesislargetendstofavorteststhatresultinequal-sizedpartitionsandpurityinbothpartitions31December2022DataMining:ConceptsandTechniques22OtherAttributeSelectionMeasuresCHAID:apopulardecisiontreealgorithm,measurebasedonχ2testforindependenceC-SEP:performsbetterthaninfo.gainandginiindexincertaincasesG-statistics:hasacloseapproximationtoχ2distributionMDL(MinimalDescriptionLength)principle(i.e.,thesimplestsolutionispreferred):Thebesttreeastheonethatrequiresthefewest#ofbitstoboth(1)encodethetree,and(2)encodetheexceptionstothetreeMultivariatesplits(partitionbasedonmultiplevariablecombinations)CART:findsmultivariatesplitsbasedonalinearcomb.ofattrs.Whichattributeselectionmeasureisthebest?Mostgivegoodresults,noneissignificantlysuperiorthanothers31December2022DataMining:ConceptsandTechniques23OverfittingandTreePruningOverfitting:AninducedtreemayoverfitthetrainingdataToomanybranches,somemayreflectanomaliesduetonoiseoroutliersPooraccuracyforunseensamplesTwoapproachestoavoidoverfittingPrepruning:Halttreeconstructionearly—donotsplitanodeifthiswouldresultinthegoodnessmeasurefallingbelowathresholdDifficulttochooseanappropriatethresholdPostpruning:Removebranchesfroma“fullygrown〞tree—getasequenceofprogressivelyprunedtreesUseasetofdatadifferentfromthetrainingdatatodecidewhichisthe“bestprunedtree〞31December2022DataMining:ConceptsandTechniques24EnhancementstoBasicDecisionTreeInductionAllowforcontinuous-valuedattributesDynamicallydefinenewdiscrete-valuedattributesthatpartitionthecontinuousattributevalueintoadiscretesetofintervalsHandlemissingattributevaluesAssignthemostcommonvalueoftheattributeAssignprobabilitytoeachofthepossiblevaluesAttributeconstructionCreatenewattributesbasedonexistingonesthataresparselyrepresentedThisreducesfragmentation,repetition,andreplication31December2022DataMining:ConceptsandTechniques25ClassificationinLargeDatabasesClassification—aclassicalproblemextensivelystudiedbystatisticiansandmachinelearningresearchersScalability:ClassifyingdatasetswithmillionsofexamplesandhundredsofattributeswithreasonablespeedWhydecisiontreeinductionindatamining?relativelyfasterlearningspeed(thanotherclassificationmethods)convertibletosimpleandeasytounderstandclassificationrulescanuseSQLqueriesforaccessingdatabasescomparableclassificationaccuracywithothermethods31December2022DataMining:ConceptsandTechniques26ScalableDecisionTreeInductionMethodsSLIQ(EDBT’96—Mehtaetal.)BuildsanindexforeachattributeandonlyclasslistandthecurrentattributelistresideinmemorySPRINT(VLDB’96—J.Shaferetal.)ConstructsanattributelistdatastructurePUBLIC(VLDB’98—Rastogi&Shim)Integratestreesplittingandtreepruning:stopgrowingthetreeearlierRainForest(VLDB’98—Gehrke,Ramakrishnan&Ganti)BuildsanAVC-list(attribute,value,classlabel)BOAT(PODS’99—Gehrke,Ganti,Ramakrishnan&Loh)Usesbootstrappingtocreateseveralsmallsamples31December2022DataMining:ConceptsandTechniques27ScalabilityFrameworkforRainForestSeparatesthescalabilityaspectsfromthecriteriathatdeterminethequalityofthetreeBuildsanAVC-list:AVC(Attribute,Value,Class_label)AVC-set(ofanattributeX)ProjectionoftrainingdatasetontotheattributeXandclasslabelwherecountsofindividualclasslabelareaggregatedAVC-group(ofanoden)SetofAVC-setsofallpredictorattributesatthenoden
31December2022DataMining:ConceptsandTechniques28Rainforest:TrainingSetandItsAVCSets
studentBuy_Computeryesnoyes61no34AgeBuy_Computeryesno<=303231..4040>4032CreditratingBuy_Computeryesnofair62excellent33AVC-setonincomeAVC-setonAgeAVC-setonStudentTrainingExamplesincomeBuy_Computeryesnohigh22medium42low31AVC-setoncredit_rating31December2022DataMining:ConceptsandTechniques29DataCube-BasedDecision-TreeInductionIntegrationofgeneralizationwithdecision-treeinduction(Kamberetal.’97)ClassificationatprimitiveconceptlevelsE.g.,precisetemperature,humidity,outlook,etc.Low-levelconcepts,scatteredclasses,bushyclassification-treesSemanticinterpretationproblemsCube-basedmulti-levelclassificationRelevanceanalysisatmulti-levelsInformation-gainanalysiswithdimension+level31December2022DataMining:ConceptsandTechniques30BOAT(BootstrappedOptimisticAlgorithmforTreeConstruction)Useastatisticaltechniquecalledbootstrappingtocreateseveralsmallersamples(subsets),eachfitsinmemoryEachsubsetisusedtocreateatree,resultinginseveraltreesThesetreesareexaminedandusedtoconstructanewtreeT’ItturnsoutthatT’isveryclosetothetreethatwouldbegeneratedusingthewholedatasettogetherAdv:requiresonlytwoscansofDB,anincrementalalg.31December2022DataMining:ConceptsandTechniques31PresentationofClassificationResults31December2022DataMining:ConceptsandTechniques3231December2022DataMining:ConceptsandTechniques33InteractiveVisualMiningbyPerception-BasedClassification(PBC)31December2022DataMining:ConceptsandTechniques34Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques35BayesianClassification:Why?Astatisticalclassifier:performsprobabilisticprediction,i.e.,predictsclassmembershipprobabilitiesFoundation:BasedonBayes’Theorem.Performance:AsimpleBayesianclassifier,na?veBayesianclassifier,hascomparableperformancewithdecisiontreeandselectedneuralnetworkclassifiersIncremental:Eachtrainingexamplecanincrementallyincrease/decreasetheprobabilitythatahypothesisiscorrect—priorknowledgecanbecombinedwithobserveddataStandard:EvenwhenBayesianmethodsarecomputationallyintractable,theycanprovideastandardofoptimaldecisionmakingagainstwhichothermethodscanbemeasured31December2022DataMining:ConceptsandTechniques36BayesianTheorem:BasicsLetXbeadatasample(“evidence〞):classlabelisunknownLetHbeahypothesisthatXbelongstoclassCClassificationistodetermineP(H|X),theprobabilitythatthehypothesisholdsgiventheobserveddatasampleXP(H)(priorprobability),theinitialprobabilityE.g.,Xwillbuycomputer,regardlessofage,income,…P(X):probabilitythatsampledataisobservedP(X|H)(posterioriprobability),theprobabilityofobservingthesampleX,giventhatthehypothesisholdsE.g.,GiventhatXwillbuycomputer,theprob.thatXis31..40,mediumincome31December2022DataMining:ConceptsandTechniques37BayesianTheoremGiventrainingdata
X,posterioriprobabilityofahypothesisH,P(H|X),followstheBayestheorem
PredictsXbelongstoCiifftheprobabilityP(Ci|X)isthehighestamongalltheP(Ck|X)forallthekclassesPracticaldifficulty:requireinitialknowledgeofmanyprobabilities,significantcomputationalcost31December2022DataMining:ConceptsandTechniques38TowardsNa?veBayesianClassifierLetDbeatrainingsetoftuplesandtheirassociatedclasslabels,andeachtupleisrepresentedbyann-DattributevectorX=(x1,x2,…,xn)SupposetherearemclassesC1,C2,…,Cm.Classificationistoderivethemaximumposteriori,i.e.,themaximalP(Ci|X)ThiscanbederivedfromBayes’theoremSinceP(X)isconstantforallclasses,onlyneedstobemaximized31December2022DataMining:ConceptsandTechniques39DerivationofNa?veBayesClassifierAsimplifiedassumption:attributesareconditionallyindependent(i.e.,nodependencerelationbetweenattributes):Thisgreatlyreducesthecomputationcost:OnlycountstheclassdistributionIfAkiscategorical,P(xk|Ci)isthe#oftuplesinCihavingvaluexkforAkdividedby|Ci,D|(#oftuplesofCiinD)IfAkiscontinous-valued,P(xk|Ci)isusuallycomputedbasedonGaussiandistributionwithameanμandstandarddeviationσandP(xk|Ci)is31December2022DataMining:ConceptsandTechniques40Na?veBayesianClassifier:TrainingDatasetClass:C1:buys_computer=‘yes’C2:buys_computer=‘no’DatasampleX=(age<=30,Income=medium,Student=yesCredit_rating=Fair)31December2022DataMining:ConceptsandTechniques41Na?veBayesianClassifier:AnExampleP(Ci):ComputeP(X|Ci)foreachclassX=(age<=30,income=medium,student=yes,credit_rating=fair)P(X|Ci):P(X|Ci)*P(Ci):
Therefore,Xbelongstoclass(“buys_computer=yes〞) 31December2022DataMining:ConceptsandTechniques42Avoidingthe0-ProbabilityProblemNa?veBayesianpredictionrequireseachconditionalprob.benon-zero.Otherwise,thepredictedprob.willbezero
Ex.Supposeadatasetwith1000tuples,income=low(0),income=medium(990),andincome=high(10),UseLaplaciancorrection(orLaplacianestimator)Adding1toeachcaseProb(income=low)=1/1003Prob(income=medium)=991/1003Prob(income=high)=11/1003The“corrected〞prob.estimatesareclosetotheir“uncorrected〞counterparts31December2022DataMining:ConceptsandTechniques43Na?veBayesianClassifier:CommentsAdvantagesEasytoimplementGoodresultsobtainedinmostofthecasesDisadvantagesAssumption:classconditionalindependence,thereforelossofaccuracyPractically,dependenciesexistamongvariablesE.g.,hospitals:patients:Profile:age,familyhistory,etc.Symptoms:fever,coughetc.,Disease:lungcancer,smoking,etc.DependenciesamongthesecannotbemodeledbyNa?veBayesianClassifierHowtodealwiththesedependencies?BayesianBeliefNetworks31December2022DataMining:ConceptsandTechniques44BayesianBeliefNetworksBayesianbeliefnetworkallowsasubsetofthevariablesconditionallyindependentAgraphicalmodelofcausalrelationshipsRepresentsdependencyamongthevariablesGivesaspecificationofjointprobabilitydistributionNodes:randomvariablesLinks:dependencyXandYaretheparentsofZ,andYistheparentofPNodependencybetweenZandPHasnoloopsorcycles31December2022DataMining:ConceptsandTechniques45BayesianBeliefNetwork:AnExampleFamilyHistoryLungCancerPositiveX-RaySmokerCoughHardtoBreathLC~LC(FH,S)(FH,~S)(~FH,S)(~FH,~S)0.10.9BayesianBeliefNetworksTheconditionalprobabilitytable(CPT)forvariableLungCancer:CPTshowstheconditionalprobabilityforeachpossiblecombinationofitsparentsDerivationoftheprobabilityofaparticularcombinationofvaluesofX,fromCPT:31December2022DataMining:ConceptsandTechniques46TrainingBayesianNetworksSeveralscenarios:Givenboththenetworkstructureandallvariablesobservable:learnonlytheCPTsNetworkstructureknown,somehiddenvariables:gradientdescent(greedyhill-climbing)method,similartoneuralnetworklearningNetworkstructureunknown,allvariablesobservable:searchthroughthemodelspacetoreconstructnetworktopologyUnknownstructure,allhiddenvariables:NogoodalgorithmsknownforthispurposeRef.D.Heckerman:Bayesiannetworksfordatamining31December2022DataMining:ConceptsandTechniques47Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques48UsingIF-THENRulesforClassificationRepresenttheknowledgeintheformofIF-THENrulesR:IFage=youthANDstudent=yesTHENbuys_computer=yesRulepreconditionvs.ruleconsequentAssessmentofarule:coverageandaccuracyncovers=#oftuplescoveredbyRncorrect=#oftuplescorrectlyclassifiedbyRcoverage(R)=ncovers/|D|/*D:trainingdataset*/accuracy(R)=ncorrect/ncoversIfmorethanoneruleistriggered,needconflictresolutionSizeordering:assignthehighestprioritytothetriggeringrulesthathasthe“toughest〞r(shí)equirement(i.e.,withthemostattributetest)Class-basedordering:decreasingorderofprevalenceormisclassificationcostperclassRule-basedordering(decisionlist):rulesareorganizedintoonelongprioritylist,accordingtosomemeasureofrulequalityorbyexperts31December2022DataMining:ConceptsandTechniques49age?student?creditrating?<=30>40noyesyesyes31..40nofairexcellentyesnoExample:Ruleextractionfromourbuys_computerdecision-treeIFage=youngANDstudent=noTHENbuys_computer=noIFage=youngANDstudent=yesTHENbuys_computer=yesIFage=mid-age THENbuys_computer=yesIFage=oldANDcredit_rating=excellentTHENbuys_computer=yesIFage=youngANDcredit_rating=fairTHENbuys_computer=noRuleExtractionfromaDecisionTreeRulesareeasiertounderstandthanlargetreesOneruleiscreatedforeachpathfromtheroottoaleafEachattribute-valuepairalongapathformsaconjunction:theleafholdstheclasspredictionRulesaremutuallyexclusiveandexhaustive31December2022DataMining:ConceptsandTechniques50RuleExtractionfromtheTrainingDataSequentialcoveringalgorithm:ExtractsrulesdirectlyfromtrainingdataTypicalsequentialcoveringalgorithms:FOIL,AQ,CN2,RIPPERRulesarelearnedsequentially,eachforagivenclassCiwillcovermanytuplesofCibutnone(orfew)ofthetuplesofotherclassesSteps:RulesarelearnedoneatatimeEachtimearuleislearned,thetuplescoveredbytherulesareremovedTheprocessrepeatsontheremainingtuplesunlessterminationcondition,e.g.,whennomoretrainingexamplesorwhenthequalityofarulereturnedisbelowauser-specifiedthresholdComp.w.decision-treeinduction:learningasetofrulessimultaneously31December2022DataMining:ConceptsandTechniques51HowtoLearn-One-Rule?Starwiththemostgeneralrulepossible:condition=emptyAddingnewattributesbyadoptingagreedydepth-firststrategyPickstheonethatmostimprovestherulequalityRule-Qualitymeasures:considerbothcoverageandaccuracyFoil-gain(inFOIL&RIPPER):assessesinfo_gainbyextendingconditionItfavorsrulesthathavehighaccuracyandcovermanypositivetuplesRulepruningbasedonanindependentsetoftesttuplesPos/negare#ofpositive/negativetuplescoveredbyR.IfFOIL_PruneishigherfortheprunedversionofR,pruneR31December2022DataMining:ConceptsandTechniques52Chapter6.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianclassificationRule-basedclassificationClassificationbybackpropagationSupportVectorMachines(SVM)AssociativeclassificationLazylearners(orlearningfromyourneighbors)OtherclassificationmethodsPredictionAccuracyanderrormeasuresEnsemblemethodsModelselectionSummary31December2022DataMining:ConceptsandTechniques53Classification:predictscategoricalclasslabelsE.g.,Personalhomepageclassificationxi=(x1,x2,x3,…),yi=+1or–1x1:#ofaword“homepage〞x2:#ofaword“welcome〞MathematicallyxX,yY={+1,–1}Wewantafunctionf:XYClassification:AMathematicalMapping31December2022DataMining:ConceptsandTechniques54LinearClassificationBinaryClassificationproblemThedataabovetheredlinebelongstoclass‘x’Thedatabelowredlinebelongstoclass‘o’Examples:SVM,Perceptron,ProbabilisticClassifiersxxxxxxxxxxooooooooooooo31December2022DataMining:ConceptsandTechniques55DiscriminativeClassifiersAdvantagespredictionaccuracyisgenerallyhighAscomparedtoBayesianmethods–ingeneralrobust,workswhentrainingexamplescontainerrorsfastevaluationofthelearnedtargetfunctionBayesiannetworksarenormallyslowCriticismlongtrainingtimedifficulttounderstandthelearnedfunction(weights)BayesiannetworkscanbeusedeasilyforpatterndiscoverynoteasytoincorporatedomainknowledgeEasyintheformofpriorsonthedataordistributions31December2022DataMining:ConceptsandTechniques56Perceptron&Winnow
Vector:x,wScalar:x,y,wInput: {(x
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五版項(xiàng)目負(fù)責(zé)人任期責(zé)任與權(quán)益合同3篇
- 2025年鐵路貨物安全運(yùn)輸全面保障合同3篇
- 2025年星巴克門店員工薪資福利調(diào)整合同3篇
- 2024年高科技企業(yè)員工試用期勞動(dòng)合同范本2篇
- 2025年度新能源發(fā)電項(xiàng)目合作協(xié)議范本4篇
- 2025年新能源充電樁建設(shè)運(yùn)營(yíng)投資借款合同3篇
- 2025年度文化產(chǎn)業(yè)園區(qū)公司股權(quán)轉(zhuǎn)讓與運(yùn)營(yíng)合同3篇
- 2025年度數(shù)據(jù)中心大清包勞務(wù)施工合同4篇
- 2024項(xiàng)目泥工人工勞務(wù)輸出合作合同版B版
- 二零二五年度棉被產(chǎn)品庫(kù)存管理與調(diào)撥合同3篇
- 2023年日語(yǔ)考試:大學(xué)日語(yǔ)六級(jí)真題模擬匯編(共479題)
- 皮帶拆除安全技術(shù)措施
- ISO9001(2015版)質(zhì)量體系標(biāo)準(zhǔn)講解
- 《培訓(xùn)資料緊固》課件
- 黑龍江省政府采購(gòu)評(píng)標(biāo)專家考試題
- 成品煙道安裝施工方案
- 醫(yī)療免責(zé)協(xié)議書范本
- 2023山東春季高考數(shù)學(xué)真題(含答案)
- 2022年初中歷史課程標(biāo)準(zhǔn)電子版
- 高中生物 人教版 選修二《生態(tài)系統(tǒng)及其穩(wěn)定性》 《生態(tài)系統(tǒng)及其穩(wěn)定性》單元教學(xué)設(shè)計(jì)
- 工程勘察設(shè)計(jì)收費(fèi)標(biāo)準(zhǔn)(2002年修訂本)完整版
評(píng)論
0/150
提交評(píng)論