




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
Chapter3
Principal
Component
Analysi3.1Introductory
Example3.2Theory3.3History
of
PCA3.4Practical
Aspect3.5Sklearn
PCA3.6Principal
Component
Regression3.7Subspace
Methodsfor
Dynamic
Model
Estimation
in
PAT
Applications
3.1IntroductoryExample
Table3.1ChemicalParametersDeterminedontheWineSamples(datafromhttp://www.modelsl.ife.ku.dk/Wine_GCMS_FTIR)
Hence,adatasetisobtainedwhichconsistsof44samplesand14variables.Theactualmeasurementscanbearrangedinatableoramatrixofsize44×14.AportionofthistableisshowninFig.3.1.Fig.3.1Asubsetofthewinedataset
With44samplesand14columns,itisquitecomplicatedtogetanoverviewofwhatkindofinformationisavailableinthedata.Agoodstartingpointistoplotindividualvariablesorsamples.ThreeofthevariablesareshowninFig.3.2.ItcanbeseenthattotalacidaswellasmethanoltendstobehigherinsamplesfromAustraliaandSouthAfricawhereastherearelesspronouncedregionaldifferencesintheethanolcontent.Fig.3.2Threevariablescolouredaccordingtotheregion
EventhoughFig.3.2maysuggestthatthereislittlerelevantregionalinformationinethanol,itisdangeroustorelytoomuchonunivariateanalysis.Inunivariateanalysis,anyco-variationwithothervariablesisexplicitlyneglectedandthismayleadtoimportantfeaturesbeingignored.Forexample,plottingethanolversusglycerol(Fig.3.3)showsaninterestingcorrelationbetweenthetwo.Thisisdifficulttodeducefromplotsoftheindividualvariables.Ifglycerolandethanolwerecompletelycorrelated,itwould,infact,bepossibletosimplyusee.g.theaverageorthesumofthetwoasonenewvariablethatcouldreplacethetwooriginalones.Noinformationwouldbelostasitwouldalwaysbepossibletogofrome.g.theaveragetothetwooriginalvariables.Fig.3.3Aplotofethanolversusglycero
ThisconceptofusingsuitablelinearcombinationsoftheoriginalvariableswillturnouttobeessentialinPCAandisexplainedinabitmoredetailandaslightlyunusualwayhere.Thenewvariable,say,theaverageofthetwooriginalones,canbedefinedasaweightedaverageofall14variables;onlytheothervariableswillhaveweightzero.These14weightsareshowninFig.3.4.ig.3.4Definingtheweightsforavariablethatincludesonlyethanolandglycerolinformation
Fig.3.5Theconceptofaunitvector
Fig.3.6.
Asmentionedabove,itispossibletogobackandforthbetweentheoriginaltwovariablesandthenewvariable.Multiplyingthenewvariablewiththeweightsprovidesanestimationoftheoriginalvariables(Fig.3.7).Fig.3.7Usingthenewvariableandtheweightstoestimatetheoldoriginalvariables
Thisisapowerfulproperty;thatitispossibletouseweightstocondenseseveralvariablesintooneandviceversa.Togeneralizethis,noticethatthecurrentconceptonlyworksperfectlywhenthetwovariablesarecompletelycorrelated.Thinkofanaveragegradeinaschoolsystem.Manyparticulargradescanleadtothesameaveragegrade,soitisnotingeneralpossibletogobackandforth.Tomakeanintelligentnewvariable,itisnaturaltoaskforanewvariablethatwillactuallyprovideanicemodelofthedata.Thatis,anewvariablewhich,whenmultipliedwiththeweights,willdescribeasmuchaspossiblethewholematrix(Fig.3.8).Suchavariablewillbeanoptimalrepresentativeofthewholedatainthesensethatnootherweightedaveragesimultaneouslydescribesasmuchoftheinformationinthematrix.Fig.3.8Definingweights(w's)thatwillgiveanewvariablewhichleadstoagoodmodelofthedata
ItturnsoutthatPCAprovidesasolutiontothisproblem.Principalcomponentanalysisprovidestheweightsneededtogetthenewvariablethatbestexplainsthevariationinthewholedatasetinacertainsense.Thisnewvariableincludingthedefiningweights,iscalledthefirstprincipalcomponent.
Withthispre-processingofthedata,PCAcanbeperformed.Thetechnicaldetailsofhowtodothatwillfollow,butthefirstprincipalcomponentisshowninFig.3.9.Inthelowerplot,theweightsareshown.InsteadofthequitesparseweightsinFig.3.4,theseweightsarenon-zeroforallvariables.Thisfirstcomponentdoesnotexplainallthevariation,butitdoesexplain25%ofwhatishappeninginthedata.Asthereare14variables,itwouldbeexpectedthatifeveryvariableshowedvariationindependentoftheother,theneachoriginalvariablewouldexplain100%/14=7%ofthevariation.Hence,thisfirstcomponentiswrappingupinformation,whichcanbesaidtocorrespondtoapproximately3-4variables
3.2Theory3.2.1Taking
Linear
Combinations
Thevariationintcanbemeasuredbyitsvariance,var(t),definedintheusualwayinstatistics.Thentheproblemtranslatestomaximizingthisvariancechoosingoptimalweightsw1,w2…,wJ.Thereisonecaveat,however,sincemultiplyinganoptimalwwithanarbitrarylargenumberwillmakethevarianceoftalsoarbitrarylarge.Hence,tohaveaproperproblem,theweightshavetobenormalized.Thisisdonebyrequiringthattheirnorm,i.e.thesum-of-squaredvaluesisone(Fig.3.5).Throughoutwewillusethesymbol‖·‖2toindicatethesquaredFrobeniusnorm(sum-of-squares).Thus,theformalproblembecomes
which
should
be
read
as
the
problemoffinding
the
wof
length
one
that
maximizes
thevarianceof
t
(notethat‖w‖=1
is
the
same
as
requiring‖w‖2=1).The
functionargmax
is
the
mathematical
notation
for
returning
the
argument
wof
the
maximizationfunction.
This
can
be
made
more
explicit
by
using
the
fact
that
t=Xw:
3.2.2Explained
Variation
The
variance
of
t
can
nowbe
calculated
but
amore
meaningful
assessment
of
thesummarizingcapabilityoftisobtainedbycalculatinghowrepresentativetisintermsofreplacingX.ThiscanbedonebyprojectingthecolumnsofXontandcalculatingtheresidualsofthatprojection.ThisisperformedbyregressingallvariablesofXontusingtheordinaryregressionequation
where
pis
thevectorofregressioncoefficientsandEisthematrixofresiduals.Interestingly,
p
equals
w
and
the
whole
machinery
of
regression
can
be
used
to
judge
thequality
of
the
summarizer
t.Traditionally,this
is
done
by
calculating
which
is
referred
to
as
the
percentage
of
explained
variationof
t.
InFig.3.10,itisillustratedhowtheexplainedvariationiscalculatedasalsoexplainedaroundequation(3.2.4).Fig.3.10Exemplifyinghowexplainedvariationiscalculatedusingthedataandtheresiduals
3.2.3PCA
as
a
Model
Equation(3.2.3)
highlights
an
important
interpretation
of
PCA:
it
can
be
seen
as
amodelling
activity(Fig.3.11).
By
rewriting
equation(3.2.3)
as
shows
that
the
(outer-)
product
tpT
serves
as
a
model
of
X(indicated
with
a
hat).In
thisequation,
vector
t
was
a
fixed
regressor
and
vector
p
the
regression
coefficient
to
be
found.It
can
be
shown
that
actually
both
t
and
pcan
be
established
from
such
an
equation
bysolving
3.2.4Taking
More
Components
If
the
percentage
of
explained
variation
of
equation(3.2.4)
is
too
small,then
the
t,pcombination
is
not
a
sufficiently
good
summarizes
of
the
data.Equation(3.2.5)suggestsan
extension
by
writing
where
T=[t1,t2,…,tR](I×R)
and
P=[p1,p2,…pR](J×R)
are
nowmatricescontaining,respectively,Rscore
vectors
and
R
loading
vectors.If
R
is(much)
smallerthan
J,
then
Tand
P
still
amount
to
a
considerably
more
parsimonious
description
of
thevariation
in
X.To
identify
the
solution,P
can
be
taken
such
that
PTP=I
and
T
can
betaken
such
that
TTT
is
a
diagonal
matrix.This
corresponds
to
the
normalization
of
theloadings
mentioned
above.
Each
loading
vector,
thus
has
normone
and
is
orthogonal
toother
loading
vectors(an
orthogonal
basis).
The
constraint
on
T
implies
that
the
scorevectorsareorthogonal
toeachother.This
istheusualwaytoperformPCAinchemometrics.
Due
to
theorthogonality
inP,theRcomponentshave
independentcontributions
to
the
overall
explained
variation
and
the
term“explained
variation
per
component”
can
beused,
similarly
as
in
equation(3.2.4).
3.3History
of
PCA
PCAhasbeen(re-)inventedseveraltimes.Theearliestpresentationwasintermsofequation(3.2.6).ThisinterpretationstressesthemodellingpropertiesofPCAandisverymuchrootedinregression-thinking:variationexplainedbytheprincipalcomponents(Pearsonsview).Later,inthethirties,theideaoftakinglinearcombinationsofvariableswasintroducedandthevariationoftheprincipalcomponentswasstressed(equation(3.2.1);Hotellingsview).
Thisisamoremultivariatestatisticalapproach.Later,itwasrealizedthatthetwoapproacheswereverysimilar.Similar,butnotthesame.Thereisafundamentalconceptualdifferencebetweenthetwoapproaches,whichisimportanttounderstand.IntheHotellingapproach,theprincipalcomponentsaretakenseriouslyintheirspecificdirection.Thefirstcomponentexplainsthemostvariation,thesecondcomponentthesecondmost,etc.Thisiscalledtheprincipalaxisproperty:theprincipalcomponentsdefinenewaxeswhichshouldbetakenseriouslyandhaveameaning.
PCAfindstheseprincipalaxes.Incontrast,inthePearsonapproachitisthesubspace,whichisimportant,nottheaxesassuch.Theaxesmerelyserveasabasisforthissubspace.IntheHotellingapproach,rotatingtheprincipalcomponentsdestroystheinterpretationofthesecomponentswhereasinthePearson
conceptualmodelrotationsmerelygenerateadifferentbasisforthe(optimal)subspace.
3.4Practical
Aspects3.4.1Preprocessing
OftenaPCAperformedon
the
rawdata
is
not
verymeaningful.
In
regressionanalysis,
often
an
intercepto
ro
ffset
is
included
since
it
is
the
deviation
fromsuchanoffset,
which
represents
the
interesting
variation.
In
terms
of
the
prototypical
example,the
absolute
levels
of
the
pH
is
not
that
interesting
but
the
variation
in
pH
of
the
differentCabernets
is
relevant.
For
PCA
to
focus
on
this
type
of
variation
it
is
necessary
to
mean-center
the
data.ThisissimplyperformedbysubtractingfromeveryvariableinXthecorrespondingmean-level.
Sometimesitisalsonecessarytothinkaboutthescalesofthedata.Inthewineexample,thereweremeasurementsofconcentrationsandofpH.Thesearenotonthesamescales(noteveninthesameunits)andtomakethevariablesmorecomparable,thevariablesarescaledbydividingthembythecorrespondingstandarddeviations.Thecombinedprocessofcenteringandscalinginthiswayisoftencalledautoscaling.Foramoredetailedaccountofcenteringandscaling,seethereferences.
Centeringandscalingarethetwomostcommontypesofpreprocessingandtheynormallyalwayshavetobedecidedupon.Therearemanyothertypesofpreprocessingmethodsavailablethough.Theappropriatepreprocessingtypicallydependsonthenatureofthedatainvestigate.
3.4.2Choosing
the
Number
of
Components
A
basic
rationale
in
PCA
is
that
the
in
for
mative
rank
of
the
datais
less
than
thenumber
of
original
variables.
Hence,
it
is
possible
to
replace
the
original
J
variables
withR(R?J)
components
and
gain
a
number
of
benefits.
The
influence
of
noise
is
minimizedas
the
original
variables
are
replaced
with
weighted
averages,and
the
interpretation
andvisualization
is
greatlyaidedbyhavingasimpler(fewer
variables)viewto
all
thevariations.
Furthermore,
the
compression
of
the
variation
into
fewer
components
can
yieldstatistica
benefits
in
further
modelling
with
the
data.
Hence,there
are
many
good
reasonsto
use
PCA.
In
order
to
use
PCA,though,it
is
necessary
to
beable
to
decide
on
how
manycomponents
to
use.The
answer
to
that
problem
depends
a
little
bit
on
the
purpose
of
theanalysis,
which
is
why
the
following
three
sections
will
provi
dedifferent
ans
wers
to
thatquestion.
EigenvaluesandTheirRelationtoPCA
Beforethemethodsaredescribed,itisnecessarytoexplaintherelationbetweenPCAandeigenvalues.Aneigenvectorofa(square)matrixAisdefinedasthenonzerovectorzwiththefollowingproperty:
Whereziscalledtheeigenvector.IfmatrixAissymmetric(semi-)positivedefinite,thenthefulleigenvaluedecompositionofAbecomes:
ScreeTest
ThescreetestwasdevelopedbyR.B.Cattellin1966.Itisbasedontheassumptionthatrelevantinformationislargerthanrandomnoiseandthatthemagnitudeofthevariationofrandomnoiseseemstoleveloffquitelinearlywiththenumberofcomponents.Traditionally,theeigenvaluesofthecross-productofthepreprocesseddata,areplottedasafunctionofthenumberofcomponents,andwhenonlynoiseismodelled,itisassumedthattheeigenvaluesaresmallanddeclinegradually.Inpractice,itmaybedifficulttoseethisintheplotofeigenvaluesduetothehugeeigenvaluesandoftenthelogarithmoftheeigenvaluesisplottedinstead.
BothareshowninFig.3.12forasimulateddatasetofrankfourandwithvariousamountsofnoiseadded.Itisseenthattheeigenvaluesleveloffafterfourcomponents,butthedetailsaredifficulttoseeintheraweigenvaluesunlesszoomedin.Itisalsoseen,thatthedistinctionbetween‘real’andnoiseeigenvaluesaredifficulttodiscernathighnoiselevels.
Forthewinedata,itisnoteasytofirmlyassessthenumberofcomponentsbasedonthescreetest(Fig.3.13).Onemayarguethatsevenormaybeninecomponentsseemfeasible,butthiswouldimplyincorporatingcomponentsthatexplainverylittlevariation.Amoreobviouschoicewouldprobablybetoassessthreecomponentsassuitablebasedonthescreeplotandthenbeawarethatfurthercomponentsmayalsocontainusefulinformation.
EigenvaluebelowOne
Ifthedataisautoscaled,eachvariablehasavarianceofone.Ifallvariablesareorthogonaltoeachother,theneverycomponentinaPCAmodelwouldhaveaneigenvalueofonesincethepreprocessedcross-productmatrix(thecorrelationmatrix)isidentity.Itisthenfairtosay,thatifacomponenthasaneigenvaluelargerthanone,itexplainsvariationofmorethanonevariable.Thishasledtotheruleofselectingallcomponentswitheigenvaluesexceedingone(seethefulllineinFig.3.13).
ItissometimesalsoreferredtoastheKaisers‘ruleorKaiser-Guttmans’ruleandmanyadditionalargumentshavebeenprovidedforthismethod.Whileitremainsaveryadhocapproach,itisneverthelessausefulrule-of-thumbtogetanideaaboutthecomplexityofadataset.Forthewinedata(Fig.3.13),therulesuggeststhataroundfourorfivecomponentsarereasonable.Note,thatforveryprecisedata,itisperfectlypossiblethatevencomponentswitheigenvaluesfarbelowonecanberealandsignificant.Realphenomenacanbesmallinvariation,yetaccurate.
BrokenStick
A
morerealisticcutofffortheeigenvaluesisobtainedwiththesocalledbrokenstickrule.Alineisaddedtothescreeplotthatshowstheeigenvaluesthatwouldbeexpectedforrandomdata(thedottedlineinFig.3.13).Thislineiscalculatedassumingthatrandomdatawillfollowaso-calledbrokenstickdistribution.ThebrokenstickdistributionhypothesizeshowrandomvariationwillpartitionandusestheanalogyofhowthelengthsofpiecesofastickwillbedistributedwhenbrokenatrandomplacesintoJpieces.
Itcanbeshownthatforauto-scaleddata,thistheoreticaldistributioncanbecalculatedas
AsseeninFig.3.13,thebrokenstickwouldseemtoindicatethatthreetofourcomponentsarereasonable
HighFractionofVariationExplained
Ifthedatameasuredhase.g.onepercentnoise,itisexpectedthatPCAwilldescribeallthevariationdowntoaroundonepercent.Hence,ifatwo-componentmodeldescribesonly50%ofthevariationandisotherwisesound,itisprobablethatmorecomponentsareneeded.Ontheotherhand,ifthedataareverynoisycominge.g.fromprocessmonitoringorconsumerpreferencemappingandhasanexpectednoisefractionofmaybe40%,thenanotherwisesoundmodelfitting90%ofthevariationwouldimplyoverfittingandfewercomponentsshouldbeused.
Havingknowledgeonthequalityofthedatacanhelpinassessingthenumberofcomponents.InFig.3.14,thevariationexplainedisshown.Theplotisequivalenttotheeigenvalueplotexceptitiscumulativeandonadifferentscale.Forthewinedata,theuncertaintyisdifferentforeachvariable,andvariesfromapproximately5andevenupto50%relativetothevariationinthedata.Thisisquitevariableandmakesitdifficulttoestimatehowmuchvariationshouldbeexplained,butmostcertainlylessthan50%wouldmeanthatallisnotexplainedandexplainingmorethan,say90%95%ofthevariationwouldbemeaninglessandjustmodellingofnoise.Therefore,basedonvariationexplained,itislikelythatthereismorethantwobutlessthan,say,sevencomponents.
Fig.3.14Cumulatedpercentagevariationexplained
ValidInterpretation
Asindicatedbytheresults,thedifferentrulesaboveseldomagree.Thisisnotasbigaproblemasitmightseem.Quiteoften,theonlythingneededistoknowtheneighbourhoodofhowmanycomponentsareneeded.Usingtheabovemethods“informally”andcritically,willoftenprovidethatanswer.Furthermore,oneofthemostimportantstrategiesforselectingthenumberofcomponentsistosupplementsuchmethodswithinterpretationsofthemodel.Forthecurrentdata,itmaybequestionedwhethere.g.threeorfourcomponentsshouldbeused.
InFig.3.15,itisshown,thatthereisdistinctstructureinthescoresofcomponentfour.Forexample,thewinesfromArgentinaallhavepositivescores.Suchastructureorgroupingwillnothappenaccidentallyunlessunfortunateconfoundinghasoccurred.Hence,aslongasArgentinianwineswerenotmeasuredseparatelyonadifferentsystemorsomethingsimilar,themerefactthatcomponentfour(eitherscoresorloadings)showsdistinctbehaviourisanargumentinfavourofincludingthatcomponent.Thisholdsregardlessofwhatothermeasuresmightindicate.Fig.3.15Left:scorenumberfourofwinedata;Right:scoretwoversusscorefour
Theloadingsmayalsoprovidesimilarvalidationbyhighlightingcorrelationsexpectedfromaprioriknowledge.Inthecaseofcontinuousdatasuchastimeseriesorspectraldata,itisalsoinstructivetolookattheshapeoftheresiduals.AnexampleisprovidedinFig.3.16.Adatasetconsistingofvisualandnearinfraredspectraof40beersamplesisshowningrey.Afteronecomponent,theresidualsarestillfairlybigandquitestructuredfromaspectralpointofview.Aftersixcomponents,thereisverylittleinformationleftindicatingthatmostofthesystematicvariationhasbeenmodeled.Notefromthetitleoftheplot,that95%ofthevariationexplainedisquitelowforthisdatasetwhereasthatwouldbecriticallyhighforthewinedataasdiscussedabove.
Cross-validation
Theideaincross-validationistoleaveoutpartofthedataandthenestimatetheleft-outpart.Ifthisisdonewisely,thepredictionoftheleft-outpartisindependentoftheactualleft-outpart.Hence,overfittingleadingtotoooptimisticmodelsisnotpossible.Conceptually,asingleelement(typicallymorethanoneelement)ofthedatamatrixisleftout.APCAmodelhandlingmissingdata,canthenbefittedtothedatasetandbasedonthisPCAmodel,anestimateoftheleftoutelementcanbeobtained.Hence,asetofresidualsisobtainedwheretherearenoproblemswithoverfitting.
Takingthesumofsquaresoftheseyieldstheso-calledPredictedREsidualSumsofSquares(PRESS)
wherexij(r)istheresidualofsampleiandvariablejafterrcomponents.FromthePRESStheRootMeanSquaredErrorofCross-Validation(RMSECV)isobtainedas
InFig.3.17,theresultsofcross-validationareshown.AsshowninFig.3.15thefittodatawilltriviallyimprovewiththenumberofcomponentsbuttheRMSECVgetsworseafterfourcomponents,indicatingthatnomorethanfourcomponentsshouldbeused.Infact,theimprovementgoingfromthreetofourcomponentsissosmall,thatthreeislikelyamorefeasiblechoicefromthatperspective.Fig.3.17AplotofRMSECVforPCAmodelswithdifferentnumberofcomponents
3.4.3When
Using
PCA
for
Other
Purposes
It
is
quite
common
to
use
PCA
as
a
preprocessing
step
in
order
to
get
a
nicely
compactrepresentation
of
a
dataset.
Instead
of
the
original
many
(J)
variables,
the
dataset
can
beexpressed
in
terms
of
the
few
(R)
principal
components.
These
components
can
then
inturn
be
used
for
many
different
purposes(Fig.3.18).Fig.3.18UsingthescoresofPCAforfurthermodelling
3.4.4Detecting
Outliers
Outliers
are
samples
that
are
somehow
disturbing
or
unusual.
Often,
out
liers
aredownright
wrong
samples.
For
example,indetermining
theheigh
to
fpersons,fivesamples
are
obtained([1.78,1.92,1.83,1.67,1.87]).The
values
are
inmeters
butaccidentally,
the
fourth
sample
has
beenmeasured
in
centimeters.
If
the
sample
is
noteithercorrectedorremoved,thesubsequentanalysisisgoingtobedetrimentallydisturbedbythisoutlier.Outlierdetectionisaboutidentifyingandhandlingsuchsamples.Analternativeorsupplementtooutlierhandlingistheuseofrobustmethods,whichwillhowever,notbetreatedindetailhere.
Thissectionismainlygoingtofocusonidentifyingoutliers,butunderstandingtheoutliersisreallythecriticalaspect.Oftenoutliersaremistakenlytakentomeanwrongsamplesandnothingcouldbemorewrong!Outlierscanbeabsolutelyright,bute.g.justbadlyrepresented.Insuchacase,thesolutionisnottoremovetheoutlier,buttosupplementthedatawithmoreofthesametype.Thebottomlineisthatitisimperativetounderstandwhyasampleisanoutlier.Thissectionwillgivethetoolstoidentifythesamplesandseeinwhatwaytheydiffer.Itisthenuptothedataanalysttodecidehowtheoutliersshouldbehandled.
DataInspection
Anoftenforgotten,butimportant,firststepindataanalysisistoinspecttherawdata.Dependingonthetypeofdata,manykindsofplotscanberelevantasalreadymentioned.Forspectraldata,lineplotsmaybenice.Fordiscretedata,histograms,normalprobabilityplots,orscatterplotscouldbefeasible.Inshort,anykindofvisualizationthatwillhelpelucidateaspectsofthedatacanbeuseful.Severalsuchplotshavealreadybeenshownthroughoutthispaper.Itisalsoimportant,andfrequentlyforgotten,tolookatthepre-processeddata.Whiletherawdataareimportant,theyactuallyneverenterthemodeling.Itisthepreprocesseddatathatwillbemodeledandtherecanbebigdifferencesintheinterpretationsoftherawandthepreprocesseddata.
ScorePlots
Whilerawandpreprocesseddatashouldalwaysbeinvestigated,sometypesofoutlierswillbedifficulttoidentifyfromthere.ThePCAmodelitselfcanprovidefurtherinformation.Therearetwoplaceswhereoutlyingbehaviorwillshowupmostevidently:inthescoresandintheresiduals.Itisappropriatetogothroughallselectedscoresandlookforsamplesthathavestrangebehaviour.Often,itisonlycomponentoneandtwothatareinvestigatedbutitisnecessarytolookatalltherelevantcomponents.
Asforthedata,itisagoodideatoplotthescoresinmanyways,usingdifferentcombinationsofscatterplots,lineplots,histograms,etc.Also,itisoftenusefultogothroughthesameplotbutcolouredbyallthevarioustypesofadditionalinformationavailable.Thiscouldbeanykindofinformationsuchastemperature,storagetimeofsample,operatororanyotherkindofeitherqualitativeorquantitativeinformationavailable.Forthewinedatamodel,itisseeninFig.3.19thatonesampleisbehavingdifferentlyfromtheothersinscoreplotoneversustwo(upperleftcorner).Fig.3.19ScoreplotofafourcomponentPCAmodelofthewinedata
Lookingattheloadingplot(Fig.3.20)indicatesthatthesamplemustbe(relatively)highinvolatileandlacticacidandlowinmalicacid.Thisshouldthenbeverifiedintherawdata.Afterremovingthissample,themodelisrebuiltandreevaluated.Nomoreextremesamplesareobservedinthescores.Fig.3.20Scatterplotofloading1versusloading2
Hotelling’sT2
Lookingatscoresishelpful,butitisonlypossibletolookatfewcomponentsatatime.Ifthemodelhasmanycomponents,itcanbelaboriousandtheriskofaccidentallymissingsomethingincreases.Inaddition,insomecases,outlierdetectionhastobeautomatedinordertofunctione.g.inanon-lineprocessmonitoringsystem.Therearewaystodoso,andacommonwayistousetheso-calledHotelling’sT2whichwasintroducedin1931.Thisdiagnosticcanbeseenasanextensionofthet-testandcanalsobeappliedtothescoresofaPCAmodel.Itiscalculatedas
WhereTisthematrixofscores(I×R)fromallthecalibrationsamplesandtiisanR×1vectorholdingtheRscoresoftheithsample.Assumingthatthescoresarenormallydistributed,thenconfidencelimitsforT
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度商鋪租賃權(quán)放棄補償協(xié)議
- 二零二五年度歷史文化名城保護內(nèi)部承包協(xié)議書
- 2025年度私人住宅裝修設(shè)計與施工一體化服務(wù)協(xié)議
- 二零二五年度智能家居硬裝升級改造合同
- 2025年度新能源汽車投資入股合同
- 2025年民宿旅游合作協(xié)議書
- 電線線纜項目安全評估報告
- 2024年全球及中國地上拼圖停車系統(tǒng)行業(yè)頭部企業(yè)市場占有率及排名調(diào)研報告
- 2024-2030全球血氣校準(zhǔn)混合物行業(yè)調(diào)研及趨勢分析報告
- 聘用銷售總監(jiān)合同
- 古詩詞誦讀《客至》課件+2023-2024學(xué)年統(tǒng)編版高中語文選擇性必修下冊
- MOOC 中國傳統(tǒng)藝術(shù)-篆刻、書法、水墨畫體驗與欣賞-哈爾濱工業(yè)大學(xué) 中國大學(xué)慕課答案
- 猜猜我有多愛你-繪本故事
- 閩教版2023版3-6年級全8冊英語單詞表
- 施工現(xiàn)場安全隱患檢查(附標(biāo)準(zhǔn)規(guī)范)
- 吞咽障礙及吞咽功能的評定
- 拱涵計算書-6.0m-1m
- 數(shù)字電子技術(shù)課程設(shè)計報告(數(shù)字積分器)
- 高中有機化學(xué)必修模塊與選修模塊的銜接
- BBC美麗中國英文字幕
- 《自然保護區(qū)綜合科學(xué)考察規(guī)程》
評論
0/150
提交評論