第十九章聚類分析Chapter19ClusteringAnalysis課件_第1頁
第十九章聚類分析Chapter19ClusteringAnalysis課件_第2頁
第十九章聚類分析Chapter19ClusteringAnalysis課件_第3頁
第十九章聚類分析Chapter19ClusteringAnalysis課件_第4頁
第十九章聚類分析Chapter19ClusteringAnalysis課件_第5頁
已閱讀5頁,還剩69頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

Chapter19ClusteringAnalysis

Chapter19ClusteringAnalysis1ContentSimilaritycoefficientHierarchicalclusteringanalysis

Dynamicclusteringanalysis

OrderedsampleclusteringanalysisContentSimilaritycoefficient2DiscriminantAnalysis:havingknownwithcertaintytocomefromtwoormorepopulations,it’samethodtoacquirethediscriminatemodelthatwillallocatefurtherindividualstothecorrectpopulation.

ClusteringAnalysis:astatisticmethodforgroupingobjectsofrandomkindintorespectivecategories.It’susedwhenthere’snopriorihypotheses,buttryingtofindthemostappropriatesortingmethodresortingtomathematicalstatisticsandsomecollectedinformation.Ithasbecomethefirstselectedmeanstouncovergreatcapacityofgeneticmessages.

Botharemethodsofmultivariatestatisticstostudyclassification.

DiscriminantAnalysis:h3Clusteringanalysisisamethodofexploringstatisticalanalysis.Itcanbeclassifiedintotwomajorspeciesaccordingtoitsaims.Forexample,mreferstothenumberofvariables(i.e.indexes)whilenreferstothatofcases(i.e.samples),youcandoasfollows:

(1)R-typeclustering:alsocalledindexclustering.Themethodtosortthemkindsofindexes,aimingatloweringthedimensionofindexesandchoosingtypicalones.

(2)Q-typeclustering:alsocalledsampleclustering.Themethodtosortthenkindsofsamplestofindthecommonnessamongthem.Clusteringanalysisisa4ThemostimportantthingforbothR-typeclusteringandQ-typeclusteringisthedefinitionofsimilarity,thatishowtoquantifysimilarity.Thefirststepofclusteringistodefinethemetricsimilaritybetweentwoindexesortwosamples-similaritycoefficientThemostimportantthingfo5§1similaritycoefficient

1similaritycoefficientofR-typeclusteringSupposetherearemkindsofvariables:X1,X2,…,Xm.R-typeclusteringusuallyusetheabsolutevalueofsimplecorrelationcoefficienttodefinethesimilaritycoefficientamongvariables:Thetwovariablestendtobemoresimilarwhentheabsolutevalueincreases.Similarly,Spearmanrankcorrelationcoefficientcanbeusedtodefinethesimilaritycoefficientofnon-normalvariables.Butwhenthevariablesareallqualitativevariables,it’sbesttousecontingencycoefficient.

§1similaritycoefficient162.SimilaritycoefficientcommonlyusedinQ-typeclustering:Supposetherearencasesregardasnspotsinamdimensionsspace,distancebetweentwospotscanbeusedtodefinesimilaritycoefficient,thetwosamplestendtobemoresimilarwhenthedistancedeclines.(1)Euclideandistance

(2)Manhattandistance

(3)Minkowskidistance:

AbsolutedistancereferstoMinkowskidistancewhenq=1;Euclideandistanceisdirect-viewingandsimpletocompute,buthavingnotregardedthecorrelatedrelationsamongvariables.That’swhyManhattandistancewasintroduced.(19-5)2.Similaritycoefficientcomm7(4)Mahalanobisdistance:it’susedtoexpressthesamplecovariancematrixamongmkindsofvariables.Itcanbeworkedoutasfollows:

Whenit’saunitmatrix,MahalanobisdistanceequalstothesquareofEuclideandistance.

Allofthefourdistancesrefertoquantitativevariables,forthequalitativevariablesandordinalvariables,quantizationisneededbeforeusing.(4)Mahalanobisdistance:it’s8§2HierarchicalClusteringAnalysisHierarchicalclusteringanalysisisamostcommonlyusedmethodtosortoutsimilarsamplesorvariables.Theprocessisasfollows:

1)Atthebeginning,samples(orvariables)areregardedrespectivelyasonesinglecluster,thatis,eachclustercontainsonlyonesample(orvariable).Thenworkoutsimilaritycoefficientmatrixamongclusters.Thematrixismadeupofsimilaritycoefficientsbetweensamples(orvariables).Similaritycoefficientmatrixisasymmetricalmatrix.

2)Thetwoclusterswiththemaximumsimilaritycoefficient(minimumdistanceormaximumcorrelationcoefficient)aremergedintoanewcluster.Computethesimilaritycoefficientbetweenthenewclusterwithotherclusters.Repeatsteptwountilallofthesamples(orvariables)aremergedintoonecluster.§2HierarchicalClustering9Thecalculationofsimilaritycoefficientbetweenclusters

Eachstepofhierarchicalclusteringhastocalculatethesimilaritycoefficientamongclusters.Whenthereisonlyonesampleorvariableineachofthetwoclusters,thesimilaritycoefficientbetweenthemequalstothatofthetwosamplesorthetwovariables,orcomputeaccordingtosectionone.

Whentherearemorethanonesampleorvariableineachcluster,manykindsofmethodscanbeusedtocomputesimilaritycoefficient.Justlist5kindsofmethodsasfollows.andrefertothetwoclusters,whichrespectivelyhasorkindsofsamplesorvariables.

Thecalculationofsimilarity101.ThemaximumsimilaritycoefficientmethodIfthere’rerespectively,samples(orvariables)inclusterand,here’realtogetherandsimilaritycoefficientsbetweenthetwoclusters,butonlythemaximumisconsideredasthesimilaritycoefficientofthetwoclusters.

Attention:theminimumdistancealsomeansthemaximumsimilaritycoefficient.

2.TheMinimumsimilaritycoefficientmethodsimilaritycoefficientbetweenclusterscanbe

calculatedasfollows:

1.Themaximumsimilaritycoeff113.Thecenterofgravitymethod(onlyusedinsampleclustering)Theweightsaretheindexmeansamongclusters.Itcanbecomputedasfollows:

4.Clusterequilibrationmethod(onlyusedin

sample

clustering)workouttheaveragesquaredistancebetweentwosamplesofeachcluster.

Clusterequilibrationisoneofthegoodmethodsinthehierarchicalclustering,becauseitcanfullyreflecttheindividualinformationwithinacluster.

3.Thecenterofgravitymeth125.sumofsquaresofdeviations

methodalsocalledWardmethod,onlyforsampleclustering.Itimitatesthebasicthoughtsofvarianceanalysis,thatis,arationalclassificationcanmakethesumofsquaresofdeviationwithinaclustersmaller,whilethatamongclusterslarger.Supposethatsampleshavebeenclassifiedintogclusters,includingand.Thesumofsquaresofdeviationsofclusterfromsamplesis:(isthemeanof).Themergedsumofsquaresofdeviationsofallthegclustersis.Ifandaremerged,therewillbeg-1clusters.

Theincrementofmergedsumofsquaresofdeviationsis,whichisdefinedasthesquaredistancebetweenthetwoclusters.Obviously,whennsamplesrespectivelyformsasinglecluster,themergedsumofsquaresofdeviationis0.5.sumofsquaresofdeviations13Sample19-1There’refourvariablessurveyingfrom3454femaleadults:height(X1)、lengthoflegs(X2)、waistline(X3)andchestcircumference(X4).Thecorrelationmatrixhasbeenworkedoutasfollows:

Trytousehierarchicalclusteringtoclusterthe4indexes.

ThisisacaseofR-type(index)clustering.Wechoosesimplesimilaritycoefficientasthesimilaritycoefficient,andusemaximumsimilaritycoefficientmethodtocalculatethesimilaritycoefficientamongclusters.Sample19-1There’refou14

Theclusteringprocedureislistedasfollows:(1)eachindexisregardedasasingleclusterG1={X1},G2={X2},G3={X3},G4={X4}.There’realtogether4clusters.

(2)Mergethetwoclusterswithmaximumsimilaritycoefficientintoanewcluster.Inthiscase,wemergeG1andG2(similaritycoefficientis0.852)asG5={X1,X2}.CalculatethesimilaritycoefficientamongG5、G3andG4.

ThesimilarmatrixamongG3,G4andG5:Theclusteringprocedure15

(3)MergeG3andG4asG6={G3,G4},forthistimethesimilaritycoefficientbetweenG3andG4ranksthelargest(0.732).ComputethesimilaritycoefficientbetweenG6andG5.

(4)LastlyG5andG6aremergedintooneclusterG7={G5,G6},whichinfactincludesalltheprimitiveindexes.(3)MergeG3andG4asG6={16Drawthehierarchicaldendrogram(picture19-1)accordingtotheprocessofclustering.Asthepictureindicates,it’sbettertobeclassifiedintotwoclusters:{X1,X2},{X3,X4}.Thatis,lengthindexasoneclusterwhilecircumferenceastheotherone.

height

lengthwaistlinechestoflegscircumference

Picture19-1hierarchicaldendrogramwith4indexesDrawthehierarchicalden17Sample19-2Table19-1liststhemeansofenergyexpenditureandsugarexpenditureoffourathleticitemsfromsixathletes.Inordertoprovidecorrespondentdietarystandardtoimproveperformancerecord,pleaseclustertheathleticitemsusinghierarchicalclustering.

Table19-1measurevaluesof4athleticitemsAthleticitemsEnergyexpenditureX1(joule/minute、m2)SugarexpenditureX2(%)WeightloadingcrouchingG127.89261.421.3150.688Pull-upG223.47556.830.1740.088Push-upsG318.92445.13-1.001-1.441Sit-upG420.91361.25-0.4880.665Sample19-2Table19-118

WechooseMinkowskidistanceinthissample,anduseminimumsimilaritycoefficientmethodtocalculatedistancesamongclusters.Toreducetheeffectofvariabledimensions,thevariablesshouldbestandardizedbeforeanalysis.respectivelyreferstothesamplemeanandstandarddeviationofXi.Thedataaftertransformationarelistedintable19-1.WechooseMinkowskidistanc19Theclusteringprocess:

(1)computethesimilaritycoefficientmatrix(i.e.distancematrix)ofthe4samples.Thedistanceofweightloadingcrouchingandpull-upscanbeworkoutusingformula(19-3).

Likewise,thedistancebetweenweightloadingcrouchingandpush-upscanbecomputedasfollows:Lastly,workoutthedistancematrix:

Theclusteringprocess:

(20(2)ThedistancebetweenG2andG4istheminimum,soG2andG4shouldbeemergedintoanewclusterG5={G2,G4}.ComputethedistancebetweenG5andotherclustersusingminimumsimilaritycoefficientmethodaccordingtoformula(19-8).

ThedistancematrixofG1,G3andG5:

(3)MergeG1andG5intoanewclusterG6={G1,G5}.ComputethedistancebetweenG6andG3:(4)lastlymergeG1andG6intoG7={G1,G6}.Alltheindexeshaveallbeenmergedintoalargecluster.(2)ThedistancebetweenG221

Accordingtotheprocessofclustering,drawoutthethehierarchydendrogram(chart19-2).Asthehierarchydendrogramshowsandexpertisewehavelearned,theindexesshouldbesortedintotwoclusters:{G1,G2,G4}and{G3}.Physicalenergyexpenditureinweightloadingcrouching、pull-upsandsit-upswouldbemuchhigher,dietarystandardimprovementmightberequiredinthoseitemsduringtraining.Accordingtotheprocess22

Analysisofclusteringexamples

Differentdefinitionofsimilaritycoefficientandthatamongclusterswillcausedifferentclusteringresults.Expertiseaswellasclusteringmethodisimportanttotheexplanationofclusteringanalysis.Analysisofclusteringexam23

Sample19-3twenty-sevenpetroleumpitchworkersandpyro-furnacemanaresurveyedabouttheirages,lengthofserviceandsmokinginformation.Inaddition,detectionsofsero-P21,sero-P53,peripheralbloodlymphocyteSCE,thenumberofchromosomalaberrationandthenumberofcellsthathadhappenedchromosomalaberrationwerecarriedoutamongtheseworkers(table19-3).(P21mutiple=P21detectionvalue/themeanofcontrolgroupP21)Pleasesortthe27workersusinghierarchicalclusteringserviceablymethod.

Sample19-3twenty-seven24Table19-3resultofbio-markerdetectionandclusteringanalysisofpetroleumpitchworkersandpyro-furnacemanSampleNumberageLengthofservicesmokeRamus/dSero-P21P21MultipleP53SCENumberofchromosomeaberrationNumberofcellsofChromosomeaberrationresultofculsterin680.358.1144235122035102.761.436.84331352252027842.190.544.1133143272024511.930.4711.4596153822032472.560.8011.68551651313037102.920.3711.6022174091031942.510.4011.40551834172046583.670.4611.3533195029050193.950.4713.4510811042202074825.890.1213.110021157301538002.990.1910.762211236152024781.950.2510.00001133712038273.010.8210.50441145232029842.350.1611.153311552321037492.950.7211.45111011642273049413.890.7313.807611744272039483.110.3313.6516141184021533602.640.3711.40001193821529362.310.6911.401112044272068515.390.9912.28762214327039263.090.4711.95001222610343813.450.5211.807512337182071425.620.8511.81552242892026122.060.3711.65111252593026382.080.7812.251112634142043223.400.4115.005512750322028622.250.698.80221Table19-3resultofbio-marke25ThisexampleapplyminimumsimilaritycoefficientmethodoriginatingfromEuclideandistance,clusterequilibrationmethodandsumofsquaresofdeviationsmethodtoclusterthedata.Theresultsarelistedinchart19-3,chart19-4andchart19-5.Allthevariableshavebeenstandardizedbeforeanalysis.Thisexampleapplyminimum26

chart19-3thehierarchydendrogramof27petroleumpitchworkersandpyro-furnacemenusingminimumsimilaritycoefficientmethodchart19-3thehierarchyden27Chart19-4thehierarchydendrogramof27petroleumpitchworkersandpyro-furnacemenusingclusterequilibrationmethodChart19-4thehierarchydend28Chart19-5thehierarchydendrogramof27petroleumpitchworkersandpyro-furnacemenusingsumofsquaresofdeviationsmethod

Chart19-5thehierarchydendr29Theoutcomesofthethreekindsofclusteringarenotthesame,fromwhichwecanseedifferentwayshavedifferentefficiency.Thedifferencesaremoredistinctincaseofmorevariables.Soyou’dbetterselectefficientvariablesbeforeclusteringanalysis.Suchasthep21andp53inthisexample.Youcangetmoreinformationbyreadingtheclusteringchart.Theoutcomesofthethreek30Accordingtoexpertise,wecanseetheoutcomeofequilibrationclusteringismorereasonable.Theclassifyingresultisfilledinthelastcolumn.Workersnumbered{10,20,23}areclassifiedasoneclass;othersareanother.researchersfindthatworkersnumbered{10,20,23}areinhighriskofcancer.Number{10,20,23,8,16,26}areclusteredtogetheraccordingtothechartofsumofsquaresofdeviations,remindingthatworkersof8,16,26maybeinhighrisktoo.Accordingtoexpertise,we31DynamicclusteringIftherearetoomanysamplesunderclassified,hierarchyclusteringanalysisdemandsmorespacetostoresimilaritycoefficientmatrix.andisquiteinefficient.What’smore,samplescan’tbechangedoncetheyareclassified.Becauseoftheseshortcomings,statistsputforwarddynamicclusteringwhichcanovercometheinefficiencyandadjusttheclassifyingalongwiththeprocessofclustering.DynamicclusteringIfther32Theprincipleofdynamicclusteringanalysisis:firstly,selectseveralrepresentativesamples,calledcohesionpoint,asthecoreofeachclass;secondly,classifyothers.adjustthecoreofeachclassuntilclassifyingisreasonable.Themostcommonwayofdynamicclusteringanalysisisk-means,whichisquiteefficientandit’sprincipleissimple.Wecangettheoutcomesevenifsamplesareinlargenumber.Howeverwehavetoknowhowmanyclassesthesamplesareclassifiedintobeforeanalysis.wemayknowundersomecircumstancesintermsofexpertise,butnotinothercases.Theprincipleofdynamicc33OrdinalClusteringMethodsClusteringanalysismentionedbeforearefornon-sequencedsamples.Butthereareanotherkindofdata,suchasagesofdevelopmentdata,incidencerateindifferentyearsandlocations.Thesedataareinorderintimeandspace,sotheyarecalledordinaldata.Wehavetotaketheorderintoconsiderationbeforeclassifyingandcannotdestroytheordersothatwecallitordinalclusteringmethods.OrdinalClusteringMethodsC34Attentions

1.Clusteringanalysisisusedtoexploredata.Explanationofoutcomesmustbeintegratedwithexpertise.trydifferentwaysofclusteringtogetreasonableoutcomes.2.pre-disposevariableandgetridofuselessvariablewhichchangelittleandthesewithtoomanyabsences.generallyspeaking,weneedtomakestandardtransformorrangetransformtoeliminateeffectofdimensionandcoefficientofvariation.Attentions

1.Clusteringana353.Reasonableoutcomesofclassifyingwillleadtodistinctdifferencesbetweenclasses,andminuteinclass.afterclassifyingwecanapplyanalysisofvarianceincaseofsinglevariable,incaseofmultiplevariabletocheckstatisticaldifferencesbetweenclasses.4.fuzzyclusteringanalysis,neuro-networksclusteringanalysis,andotherspecificanalysistoexploregeneticdataarenotintroducedhere,pleaseinquirerelatedinformationoninternet.3.Reasonableoutcomesofclas36Enjoylearning!Enjoylearning!37Chapter19ClusteringAnalysis

Chapter19ClusteringAnalysis38ContentSimilaritycoefficientHierarchicalclusteringanalysis

Dynamicclusteringanalysis

OrderedsampleclusteringanalysisContentSimilaritycoefficient39DiscriminantAnalysis:havingknownwithcertaintytocomefromtwoormorepopulations,it’samethodtoacquirethediscriminatemodelthatwillallocatefurtherindividualstothecorrectpopulation.

ClusteringAnalysis:astatisticmethodforgroupingobjectsofrandomkindintorespectivecategories.It’susedwhenthere’snopriorihypotheses,buttryingtofindthemostappropriatesortingmethodresortingtomathematicalstatisticsandsomecollectedinformation.Ithasbecomethefirstselectedmeanstouncovergreatcapacityofgeneticmessages.

Botharemethodsofmultivariatestatisticstostudyclassification.

DiscriminantAnalysis:h40Clusteringanalysisisamethodofexploringstatisticalanalysis.Itcanbeclassifiedintotwomajorspeciesaccordingtoitsaims.Forexample,mreferstothenumberofvariables(i.e.indexes)whilenreferstothatofcases(i.e.samples),youcandoasfollows:

(1)R-typeclustering:alsocalledindexclustering.Themethodtosortthemkindsofindexes,aimingatloweringthedimensionofindexesandchoosingtypicalones.

(2)Q-typeclustering:alsocalledsampleclustering.Themethodtosortthenkindsofsamplestofindthecommonnessamongthem.Clusteringanalysisisa41ThemostimportantthingforbothR-typeclusteringandQ-typeclusteringisthedefinitionofsimilarity,thatishowtoquantifysimilarity.Thefirststepofclusteringistodefinethemetricsimilaritybetweentwoindexesortwosamples-similaritycoefficientThemostimportantthingfo42§1similaritycoefficient

1similaritycoefficientofR-typeclusteringSupposetherearemkindsofvariables:X1,X2,…,Xm.R-typeclusteringusuallyusetheabsolutevalueofsimplecorrelationcoefficienttodefinethesimilaritycoefficientamongvariables:Thetwovariablestendtobemoresimilarwhentheabsolutevalueincreases.Similarly,Spearmanrankcorrelationcoefficientcanbeusedtodefinethesimilaritycoefficientofnon-normalvariables.Butwhenthevariablesareallqualitativevariables,it’sbesttousecontingencycoefficient.

§1similaritycoefficient1432.SimilaritycoefficientcommonlyusedinQ-typeclustering:Supposetherearencasesregardasnspotsinamdimensionsspace,distancebetweentwospotscanbeusedtodefinesimilaritycoefficient,thetwosamplestendtobemoresimilarwhenthedistancedeclines.(1)Euclideandistance

(2)Manhattandistance

(3)Minkowskidistance:

AbsolutedistancereferstoMinkowskidistancewhenq=1;Euclideandistanceisdirect-viewingandsimpletocompute,buthavingnotregardedthecorrelatedrelationsamongvariables.That’swhyManhattandistancewasintroduced.(19-5)2.Similaritycoefficientcomm44(4)Mahalanobisdistance:it’susedtoexpressthesamplecovariancematrixamongmkindsofvariables.Itcanbeworkedoutasfollows:

Whenit’saunitmatrix,MahalanobisdistanceequalstothesquareofEuclideandistance.

Allofthefourdistancesrefertoquantitativevariables,forthequalitativevariablesandordinalvariables,quantizationisneededbeforeusing.(4)Mahalanobisdistance:it’s45§2HierarchicalClusteringAnalysisHierarchicalclusteringanalysisisamostcommonlyusedmethodtosortoutsimilarsamplesorvariables.Theprocessisasfollows:

1)Atthebeginning,samples(orvariables)areregardedrespectivelyasonesinglecluster,thatis,eachclustercontainsonlyonesample(orvariable).Thenworkoutsimilaritycoefficientmatrixamongclusters.Thematrixismadeupofsimilaritycoefficientsbetweensamples(orvariables).Similaritycoefficientmatrixisasymmetricalmatrix.

2)Thetwoclusterswiththemaximumsimilaritycoefficient(minimumdistanceormaximumcorrelationcoefficient)aremergedintoanewcluster.Computethesimilaritycoefficientbetweenthenewclusterwithotherclusters.Repeatsteptwountilallofthesamples(orvariables)aremergedintoonecluster.§2HierarchicalClustering46Thecalculationofsimilaritycoefficientbetweenclusters

Eachstepofhierarchicalclusteringhastocalculatethesimilaritycoefficientamongclusters.Whenthereisonlyonesampleorvariableineachofthetwoclusters,thesimilaritycoefficientbetweenthemequalstothatofthetwosamplesorthetwovariables,orcomputeaccordingtosectionone.

Whentherearemorethanonesampleorvariableineachcluster,manykindsofmethodscanbeusedtocomputesimilaritycoefficient.Justlist5kindsofmethodsasfollows.andrefertothetwoclusters,whichrespectivelyhasorkindsofsamplesorvariables.

Thecalculationofsimilarity471.ThemaximumsimilaritycoefficientmethodIfthere’rerespectively,samples(orvariables)inclusterand,here’realtogetherandsimilaritycoefficientsbetweenthetwoclusters,butonlythemaximumisconsideredasthesimilaritycoefficientofthetwoclusters.

Attention:theminimumdistancealsomeansthemaximumsimilaritycoefficient.

2.TheMinimumsimilaritycoefficientmethodsimilaritycoefficientbetweenclusterscanbe

calculatedasfollows:

1.Themaximumsimilaritycoeff483.Thecenterofgravitymethod(onlyusedinsampleclustering)Theweightsaretheindexmeansamongclusters.Itcanbecomputedasfollows:

4.Clusterequilibrationmethod(onlyusedin

sample

clustering)workouttheaveragesquaredistancebetweentwosamplesofeachcluster.

Clusterequilibrationisoneofthegoodmethodsinthehierarchicalclustering,becauseitcanfullyreflecttheindividualinformationwithinacluster.

3.Thecenterofgravitymeth495.sumofsquaresofdeviations

methodalsocalledWardmethod,onlyforsampleclustering.Itimitatesthebasicthoughtsofvarianceanalysis,thatis,arationalclassificationcanmakethesumofsquaresofdeviationwithinaclustersmaller,whilethatamongclusterslarger.Supposethatsampleshavebeenclassifiedintogclusters,includingand.Thesumofsquaresofdeviationsofclusterfromsamplesis:(isthemeanof).Themergedsumofsquaresofdeviationsofallthegclustersis.Ifandaremerged,therewillbeg-1clusters.

Theincrementofmergedsumofsquaresofdeviationsis,whichisdefinedasthesquaredistancebetweenthetwoclusters.Obviously,whennsamplesrespectivelyformsasinglecluster,themergedsumofsquaresofdeviationis0.5.sumofsquaresofdeviations50Sample19-1There’refourvariablessurveyingfrom3454femaleadults:height(X1)、lengthoflegs(X2)、waistline(X3)andchestcircumference(X4).Thecorrelationmatrixhasbeenworkedoutasfollows:

Trytousehierarchicalclusteringtoclusterthe4indexes.

ThisisacaseofR-type(index)clustering.Wechoosesimplesimilaritycoefficientasthesimilaritycoefficient,andusemaximumsimilaritycoefficientmethodtocalculatethesimilaritycoefficientamongclusters.Sample19-1There’refou51

Theclusteringprocedureislistedasfollows:(1)eachindexisregardedasasingleclusterG1={X1},G2={X2},G3={X3},G4={X4}.There’realtogether4clusters.

(2)Mergethetwoclusterswithmaximumsimilaritycoefficientintoanewcluster.Inthiscase,wemergeG1andG2(similaritycoefficientis0.852)asG5={X1,X2}.CalculatethesimilaritycoefficientamongG5、G3andG4.

ThesimilarmatrixamongG3,G4andG5:Theclusteringprocedure52

(3)MergeG3andG4asG6={G3,G4},forthistimethesimilaritycoefficientbetweenG3andG4ranksthelargest(0.732).ComputethesimilaritycoefficientbetweenG6andG5.

(4)LastlyG5andG6aremergedintooneclusterG7={G5,G6},whichinfactincludesalltheprimitiveindexes.(3)MergeG3andG4asG6={53Drawthehierarchicaldendrogram(picture19-1)accordingtotheprocessofclustering.Asthepictureindicates,it’sbettertobeclassifiedintotwoclusters:{X1,X2},{X3,X4}.Thatis,lengthindexasoneclusterwhilecircumferenceastheotherone.

height

lengthwaistlinechestoflegscircumference

Picture19-1hierarchicaldendrogramwith4indexesDrawthehierarchicalden54Sample19-2Table19-1liststhemeansofenergyexpenditureandsugarexpenditureo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論