解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks

上傳人：1*** IP屬地：山西上傳時間：2025-01-12 格式：DOCX 頁數(shù)：42 大小：1.54MB 積分：19.9 舉報 版權(quán)申訴

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第2頁

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第3頁

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第4頁

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第5頁

已閱讀5頁，還剩37頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

UnravelingMeta-Learning:UnderstandingFeature

RepresentationsforFew-ShotTasks

HarichandanaVejendla

(50478049)

Definitions

?Meta-Learning:Meta-learningdescribesmachinelearningalgorithmsthatacquireknowledgeandunderstandingfromtheoutcomeofothermachinelearningalgorithms.Theylearnhowtobest

combinethepredictionsfromothermachine-learningalgorithms.

?Few-shotLearning:Few-ShotLearningisaMachineLearningframeworkthatenablesapre-trainedmodeltogeneralizeovernewcategoriesofdatausingonlyafewlabeledsamplesperclass.

?FeatureExtraction:Featureextractionisaprocessofdimensionalityreductionthatinvolvestransformingrawdataintonumericalfeaturesthatcanbeprocessed.

?Featureclustering:Featureclusteringaggregatespointfeaturesintogroupswhosemembersaresimilartoeachotherandnotsimilartomembersofothergroups.

?FeatureRepresentation:RepresentationLearningorfeaturelearningisthesubdisciplineofthe

machinelearningspacethatdealswithextractingfeaturesorunderstandingtherepresentationofadataset.

Introduction

?TransferLearning:Pre-trainingamodelonlargeauxiliarydatasetsandthenfine-tuningtheresultingmodelsonthetargettask.Thisisusedforfew-shotlearningsinceonlyafewdatasamplesareavailableinthetarget

domain.

?Transferlearningfromclassicallytrainedmodelsyieldspoorperformanceforfew-shotlearning.Recently,few-shotlearninghasbeenrapidlyimprovedusingmeta-learningmethods.

?Thissuggeststhatthefeaturerepresentationslearnedbymeta-learningmustbefundamentallydifferentfromfeaturerepresentationslearnedthroughconventionaltraining.

?Thispaperunderstandsthedifferencesbetweenfeatureslearnedbymeta-learningandclassicaltraining.

?Basedonthis,thepaperproposessimpleregularizersthatboostfew-shotperformanceappreciably.

Meta-LearningFramework

?Inthecontextoffew-shotlearning,theobjectiveofmeta-learningalgorithmsistoproduceanetworkthatquicklyadaptstonewclassesusinglittledata.

?Meta-learningalgorithmsfindparametersthatcanbefine-tunedinafewoptimizationstepsandonafewdatapointsinordertoachievegoodgeneralization.

?Thetaskischaracterizedasn-way,k-shotifthemeta-learningalgorithmmustadapttoclassifydatafromTiafterseeingkexamplesfromeachofthenclassesinTi.

Algorithm

AlgorithmDescription

?Meta-learningschemestypicallyrelyonbi-leveloptimizationproblemswithaninnerloopandanouterloop.

?Aniterationoftheouterloopinvolvesfirstsamplinga“task,”whichcomprisestwosetsoflabeleddata:thesupportdata,Tis,andthequerydata,Tiq.

?Intheinnerloop,themodelbeingtrainedisfine-tunedusingthesupportdata.

?Fine-tuningproducesnewparametersθi,thatareafunctionoftheoriginalparametersandsupportdata.

?Weevaluatethelossonthequerydataandcomputethegradientsw.r.ttheoriginalparametersθ.Weneedtounrollthefine-tuningstepsandbackpropagatethroughthemtocomputethegradients.

?Finally,theroutinemovesbacktotheouterloop,wherethemeta-learningalgorithmminimizeslossonthequerydatawithrespecttothepre-fine-tunedweights.Basemodelparametersareupdatedusingthe

gradients.

Meta-LearningAlgorithms

Avarietyofmeta-learningalgorithmsexist,mostlydifferinginhowtheyarefine-tunedusingthesupportdataduringtheinnerloop:

?MAML:Updatesallnetworkparametersusinggradientdescentduringfine-tuning.

?R2-D2andMetaOptNet:Last-layermeta-learningmethods(onlytrainthelastlayer).Theyfreezethefeatureextractionlayers(featureextractor’sparametersarefrozen)duringtheinnerloop.Onlythelinearclassifierlayeristrainedduringfine-tuning.

?ProtoNet:Last-layermeta-learningmethod.Itclassifiesexamplesbytheproximityoftheirfeaturestothoseofclasscentroids.Theextractedfeaturesareusedtocreateclasscentroidswhichthen

determinethenetwork’sclassboundaries.

Few-ShotDatasets

?Mini-ImageNet:ItisaprunedanddownsizedversionoftheImageNetclassificationdataset,

consistingof60,000,84×84RGBcolorimagesfrom100.These100classesaresplitinto64,16,and20classesfortraining,validation,andtestingsets,respectively.

?CIFAR-FSdataset:samplesimagesfromCIFAR-100.CIFAR-FSissplitinthesamewayasmini-ImageNetwith60,00032×32RGBcolorimagesfrom100classesdividedinto64,16,and20

classesfortraining,validation,andtestingsets,respectively.

ComparisonbetweenMeta-LearningandClassicalTrainingModels

?DatasetUsed:1-shotmini-ImageNet

?Classicallytrainedmodelsaretrainedusingcross-entropylossandSGD.

?Commonfine-tuningproceduresareusedforbothmeta-learnedandclassically-trainedmodelsforafaircomparison

?Resultsshowthatmeta-learningmodelsperformbetterthanclassicaltrainingmodelsonfew-shotclassification.

?Thisperformanceadvantageacrosstheboardsuggeststhatmeta-learnedfeaturesarequalitativelydifferentfromconventionalfeaturesandfundamentallysuperiorforfew-shotlearning.

ClassClusteringinFeatureSpace

MeasuringClusteringinFeatureSpace:

Tomeasurefeatureclustering(FC),weconsidertheintra-classtointer-classvarianceratio:

φi,j-featurevectorcorrespondingtodatapointinclassiintrainingdata

μi-meanoffeaturevectorsinclassi

μ-meanacrossallfeaturevectors

C-numberofclasses

N-numberofdatapointsperclass

Where,fθ(xi,j)=φi,jfθ-featureextractor

xi,j-trainingdatainclassi

Lowvaluesofthisfractioncorrespondtocollectionsoffeaturessuchthatclassesarewell-separatedandahyperplaneformedbychoosingapointfromeachoftwoclassesdoesnotvarydramaticallywiththechoiceofsamples.

WhyClusteringisimportant?

?Asfeaturesinaclassbecomespreadoutandtheclassesarebroughtclosertogether,theclassificationboundariesformedbysamplingone-shotdataoftenmisclassifylargeregions.

?Asfeaturesinaclassarecompactedandclassesmovefarapartfromeachother,theintra-classtointer-classvarianceratiodrops,andthedependenceoftheclassboundaryonthechoiceofone-shotsamplesbecomesweaker.

ComparingFeatureRepresentationsofMeta-LearningandClassicallyTrainedModels

?Threeclassesarerandomlychosenfromthetestset,and100samplesaretakenfromeachclass.Thesamplesarethenpassedthroughthefeatureextractor,andtheresultingvectorsareplotted.

?Becausefeaturespaceishigh-dimensional,weperformalinearprojectionontothefirsttwocomponentvectorsdeterminedbyLDA.

?Lineardiscriminantanalysis(LDA)projectsdataontodirectionsthatminimizetheintra-classtointer-classvarianceratio.

?Theclassicallytrainedmodelmashesfeaturestogether,whilethemeta-learnedmodelsdrawtheclassesfartherapart.

HyperplaneInvariance

Thisregularizerwithonethatpenalizesvariationsinthemaximum-marginhyperplaneseparatingfeaturevectorsin

oppositeclasses

HyperplaneVariationRegularizer:

DatpointsinclassA:x1,x2

DatapointsinclassB:y1,y2

fθ-featureextractor

fθ(x1)-fθ(y1):determinesthedirectionofthemaximum

marginhyperplaneseparatingthetwopointsinthefeaturespace

?Thisfunctionmeasuresthedistancebetweendistancevectorsx1?y1andx2?y2relativetotheirsize.

?Inpractice,duringabatchoftraining,wesamplemanypairsofclassesandtwosamplesfromeachclass.Then,wecomputeRHVonallclasspairsandaddthesetermstothecross-entropyloss.

?WefindthatthisregularizerperformsalmostaswellasFeatureClusteringRegularizerandconclusivelyoutperformsnon-regularizedclassicaltraining.

Experiments

?FeatureclusteringandHyperplanevariationvaluesarecomputed.

?Thesetwoquantitiesmeasuretheintra-classtointer-classvarianceratioandinvarianceofseparatinghyperplanes.

?Lowervaluesofeachmeasurementcorrespondtobetterclassseparation.

?OnbothCIFAR-FSandmini-ImageNet,themeta-learnedmodelsattainlowervalues,indicatingthatfeaturespaceclusteringplaysaroleintheeffectivenessofmeta-learning.

Experiments

?Weincorporatetheseregularizersintoastandardtrainingroutineoftheclassicaltrainingmodel.

?Inallexperiments,featureclusteringimprovestheperformanceoftransferlearningandsometimesevenachieveshigherperformancethanmeta-learning

WeightClustering:FindingClustersofLocalMinimaforTaskLossesinParameterSpace

?SinceReptiledoesnotfixthefeatureextractorduringfine-tuning,itmustfindparametersthatadapteasilytonewtasks.

?WehypothesizethatReptilefindsparametersthatlieveryclosetogoodminimaformanytasksandis,therefore,abletoperformwellonthesetasksafterverylittlefine-tuning.

?ThishypothesisisfurthermotivatedbythecloserelationshipbetweenReptileandconsensusoptimization.

?Inaconsensusmethod,anumberofmodelsareindependentlyoptimizedwiththeirowntask-specificparameters,andthetaskscommunicateviaapenaltythatencouragesalltheindividualsolutionsto

convergearoundacommonvalue.

ConsensusFormulation:

?Reptilecanbeinterpretedasapproximatelyminimizingtheconsensusformulation

?Reptiledivergesfromatraditionalconsensusoptimizeronlyinthatitdoesnotexplicitlyconsiderthequadraticpenaltytermwhenminimizingfor?θp.

ConsensusOptimizationImprovesReptile

?WemodifyReptiletoexplicitlyenforceparameterclusteringaroundaconsensusvalue.

?Wefindthatdirectlyoptimizingtheconsensusformulationleadstoimprovedperformance.

?duringeachinnerloopupdatestepinReptile,wepenalizethesquareddistancefromtheparametersforthecurrenttasktotheaverageoftheparametersacrossalltasksinthecurrentbatch.

?ThisisequivalenttotheoriginalReptilewhenα=0.Wecallthismethod“Weight-Clustering.

ReptilewithWeightClusteringRegularizer

n-numberofmeta-trainingsteps

k-numberofiterationsorstepstoperformwithineachmeta-trainingstep

Resultsofweightclustering

?WecomparetheperformanceofourregularizedReptilealgorithmtothatoftheoriginalReptilemethodaswellasfirst-orderMAML(FOMAML)andaclassicallytrainedmodelofthesamearchitecture.We

testthesemethodsonasampleof100,0005-way1-shotand5-shotmini-ImageNettasks

?ReptilewithWeight-Clusteringachieveshigherperformance.

Resultsofweightclustering

?ParametersofnetworkstrainedusingourregularizedversionofReptiledonottravelasfarduringfine-tuningatinferenceasthosetrainedusingvanillaReptile

?Fromthese,weconcludethatourregularizerdoesindeedmovemo

人人文庫> 全部分類> 行業(yè)資料 > 信息產(chǎn)業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks

文檔簡介

溫馨提示

最新文檔

評論

解析元學(xué)習(xí)：了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔