解析元學(xué)習(xí):了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第1頁
解析元學(xué)習(xí):了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第2頁
解析元學(xué)習(xí):了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第3頁
解析元學(xué)習(xí):了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第4頁
解析元學(xué)習(xí):了解功能少量射擊任務(wù)的表示法 Unraveling Meta-Learning - Understanding Feature Representations for Few-Shot Tasks_第5頁
已閱讀5頁,還剩37頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

UnravelingMeta-Learning:UnderstandingFeature

RepresentationsforFew-ShotTasks

HarichandanaVejendla

(50478049)

1

2

Definitions

?Meta-Learning:Meta-learningdescribesmachinelearningalgorithmsthatacquireknowledgeandunderstandingfromtheoutcomeofothermachinelearningalgorithms.Theylearnhowtobest

combinethepredictionsfromothermachine-learningalgorithms.

?Few-shotLearning:Few-ShotLearningisaMachineLearningframeworkthatenablesapre-trainedmodeltogeneralizeovernewcategoriesofdatausingonlyafewlabeledsamplesperclass.

?FeatureExtraction:Featureextractionisaprocessofdimensionalityreductionthatinvolvestransformingrawdataintonumericalfeaturesthatcanbeprocessed.

?Featureclustering:Featureclusteringaggregatespointfeaturesintogroupswhosemembersaresimilartoeachotherandnotsimilartomembersofothergroups.

?FeatureRepresentation:RepresentationLearningorfeaturelearningisthesubdisciplineofthe

machinelearningspacethatdealswithextractingfeaturesorunderstandingtherepresentationofadataset.

3

Introduction

?TransferLearning:Pre-trainingamodelonlargeauxiliarydatasetsandthenfine-tuningtheresultingmodelsonthetargettask.Thisisusedforfew-shotlearningsinceonlyafewdatasamplesareavailableinthetarget

domain.

?Transferlearningfromclassicallytrainedmodelsyieldspoorperformanceforfew-shotlearning.Recently,few-shotlearninghasbeenrapidlyimprovedusingmeta-learningmethods.

?Thissuggeststhatthefeaturerepresentationslearnedbymeta-learningmustbefundamentallydifferentfromfeaturerepresentationslearnedthroughconventionaltraining.

?Thispaperunderstandsthedifferencesbetweenfeatureslearnedbymeta-learningandclassicaltraining.

?Basedonthis,thepaperproposessimpleregularizersthatboostfew-shotperformanceappreciably.

4

Meta-LearningFramework

?Inthecontextoffew-shotlearning,theobjectiveofmeta-learningalgorithmsistoproduceanetworkthatquicklyadaptstonewclassesusinglittledata.

?Meta-learningalgorithmsfindparametersthatcanbefine-tunedinafewoptimizationstepsandonafewdatapointsinordertoachievegoodgeneralization.

?Thetaskischaracterizedasn-way,k-shotifthemeta-learningalgorithmmustadapttoclassifydatafromTiafterseeingkexamplesfromeachofthenclassesinTi.

Algorithm

5

6

AlgorithmDescription

?Meta-learningschemestypicallyrelyonbi-leveloptimizationproblemswithaninnerloopandanouterloop.

?Aniterationoftheouterloopinvolvesfirstsamplinga“task,”whichcomprisestwosetsoflabeleddata:thesupportdata,Tis,andthequerydata,Tiq.

?Intheinnerloop,themodelbeingtrainedisfine-tunedusingthesupportdata.

?Fine-tuningproducesnewparametersθi,thatareafunctionoftheoriginalparametersandsupportdata.

?Weevaluatethelossonthequerydataandcomputethegradientsw.r.ttheoriginalparametersθ.Weneedtounrollthefine-tuningstepsandbackpropagatethroughthemtocomputethegradients.

?Finally,theroutinemovesbacktotheouterloop,wherethemeta-learningalgorithmminimizeslossonthequerydatawithrespecttothepre-fine-tunedweights.Basemodelparametersareupdatedusingthe

gradients.

7

Meta-LearningAlgorithms

Avarietyofmeta-learningalgorithmsexist,mostlydifferinginhowtheyarefine-tunedusingthesupportdataduringtheinnerloop:

?MAML:Updatesallnetworkparametersusinggradientdescentduringfine-tuning.

?R2-D2andMetaOptNet:Last-layermeta-learningmethods(onlytrainthelastlayer).Theyfreezethefeatureextractionlayers(featureextractor’sparametersarefrozen)duringtheinnerloop.Onlythelinearclassifierlayeristrainedduringfine-tuning.

?ProtoNet:Last-layermeta-learningmethod.Itclassifiesexamplesbytheproximityoftheirfeaturestothoseofclasscentroids.Theextractedfeaturesareusedtocreateclasscentroidswhichthen

determinethenetwork’sclassboundaries.

8

Few-ShotDatasets

?Mini-ImageNet:ItisaprunedanddownsizedversionoftheImageNetclassificationdataset,

consistingof60,000,84×84RGBcolorimagesfrom100.These100classesaresplitinto64,16,and20classesfortraining,validation,andtestingsets,respectively.

?CIFAR-FSdataset:samplesimagesfromCIFAR-100.CIFAR-FSissplitinthesamewayasmini-ImageNetwith60,00032×32RGBcolorimagesfrom100classesdividedinto64,16,and20

classesfortraining,validation,andtestingsets,respectively.

ComparisonbetweenMeta-LearningandClassicalTrainingModels

?DatasetUsed:1-shotmini-ImageNet

?Classicallytrainedmodelsaretrainedusingcross-entropylossandSGD.

?Commonfine-tuningproceduresareusedforbothmeta-learnedandclassically-trainedmodelsforafaircomparison

?Resultsshowthatmeta-learningmodelsperformbetterthanclassicaltrainingmodelsonfew-shotclassification.

?Thisperformanceadvantageacrosstheboardsuggeststhatmeta-learnedfeaturesarequalitativelydifferentfromconventionalfeaturesandfundamentallysuperiorforfew-shotlearning.

9

10

ClassClusteringinFeatureSpace

MeasuringClusteringinFeatureSpace:

Tomeasurefeatureclustering(FC),weconsidertheintra-classtointer-classvarianceratio:

φi,j-featurevectorcorrespondingtodatapointinclassiintrainingdata

μi-meanoffeaturevectorsinclassi

μ-meanacrossallfeaturevectors

C-numberofclasses

N-numberofdatapointsperclass

Where,fθ(xi,j)=φi,jfθ-featureextractor

xi,j-trainingdatainclassi

Lowvaluesofthisfractioncorrespondtocollectionsoffeaturessuchthatclassesarewell-separatedandahyperplaneformedbychoosingapointfromeachoftwoclassesdoesnotvarydramaticallywiththechoiceofsamples.

WhyClusteringisimportant?

?Asfeaturesinaclassbecomespreadoutandtheclassesarebroughtclosertogether,theclassificationboundariesformedbysamplingone-shotdataoftenmisclassifylargeregions.

?Asfeaturesinaclassarecompactedandclassesmovefarapartfromeachother,theintra-classtointer-classvarianceratiodrops,andthedependenceoftheclassboundaryonthechoiceofone-shotsamplesbecomesweaker.

11

ComparingFeatureRepresentationsofMeta-LearningandClassicallyTrainedModels

?Threeclassesarerandomlychosenfromthetestset,and100samplesaretakenfromeachclass.Thesamplesarethenpassedthroughthefeatureextractor,andtheresultingvectorsareplotted.

?Becausefeaturespaceishigh-dimensional,weperformalinearprojectionontothefirsttwocomponentvectorsdeterminedbyLDA.

?Lineardiscriminantanalysis(LDA)projectsdataontodirectionsthatminimizetheintra-classtointer-classvarianceratio.

?Theclassicallytrainedmodelmashesfeaturestogether,whilethemeta-learnedmodelsdrawtheclassesfartherapart.

12

13

HyperplaneInvariance

Thisregularizerwithonethatpenalizesvariationsinthemaximum-marginhyperplaneseparatingfeaturevectorsin

oppositeclasses

HyperplaneVariationRegularizer:

DatpointsinclassA:x1,x2

DatapointsinclassB:y1,y2

fθ-featureextractor

fθ(x1)-fθ(y1):determinesthedirectionofthemaximum

marginhyperplaneseparatingthetwopointsinthefeaturespace

?Thisfunctionmeasuresthedistancebetweendistancevectorsx1?y1andx2?y2relativetotheirsize.

?Inpractice,duringabatchoftraining,wesamplemanypairsofclassesandtwosamplesfromeachclass.Then,wecomputeRHVonallclasspairsandaddthesetermstothecross-entropyloss.

?WefindthatthisregularizerperformsalmostaswellasFeatureClusteringRegularizerandconclusivelyoutperformsnon-regularizedclassicaltraining.

14

Experiments

?FeatureclusteringandHyperplanevariationvaluesarecomputed.

?Thesetwoquantitiesmeasuretheintra-classtointer-classvarianceratioandinvarianceofseparatinghyperplanes.

?Lowervaluesofeachmeasurementcorrespondtobetterclassseparation.

?OnbothCIFAR-FSandmini-ImageNet,themeta-learnedmodelsattainlowervalues,indicatingthatfeaturespaceclusteringplaysaroleintheeffectivenessofmeta-learning.

15

Experiments

?Weincorporatetheseregularizersintoastandardtrainingroutineoftheclassicaltrainingmodel.

?Inallexperiments,featureclusteringimprovestheperformanceoftransferlearningandsometimesevenachieveshigherperformancethanmeta-learning

16

WeightClustering:FindingClustersofLocalMinimaforTaskLossesinParameterSpace

?SinceReptiledoesnotfixthefeatureextractorduringfine-tuning,itmustfindparametersthatadapteasilytonewtasks.

?WehypothesizethatReptilefindsparametersthatlieveryclosetogoodminimaformanytasksandis,therefore,abletoperformwellonthesetasksafterverylittlefine-tuning.

?ThishypothesisisfurthermotivatedbythecloserelationshipbetweenReptileandconsensusoptimization.

?Inaconsensusmethod,anumberofmodelsareindependentlyoptimizedwiththeirowntask-specificparameters,andthetaskscommunicateviaapenaltythatencouragesalltheindividualsolutionsto

convergearoundacommonvalue.

17

ConsensusFormulation:

?Reptilecanbeinterpretedasapproximatelyminimizingtheconsensusformulation

?Reptiledivergesfromatraditionalconsensusoptimizeronlyinthatitdoesnotexplicitlyconsiderthequadraticpenaltytermwhenminimizingfor?θp.

18

ConsensusOptimizationImprovesReptile

?WemodifyReptiletoexplicitlyenforceparameterclusteringaroundaconsensusvalue.

?Wefindthatdirectlyoptimizingtheconsensusformulationleadstoimprovedperformance.

?duringeachinnerloopupdatestepinReptile,wepenalizethesquareddistancefromtheparametersforthecurrenttasktotheaverageoftheparametersacrossalltasksinthecurrentbatch.

?ThisisequivalenttotheoriginalReptilewhenα=0.Wecallthismethod“Weight-Clustering.

ReptilewithWeightClusteringRegularizer

n-numberofmeta-trainingsteps

k-numberofiterationsorstepstoperformwithineachmeta-trainingstep

19

20

Resultsofweightclustering

?WecomparetheperformanceofourregularizedReptilealgorithmtothatoftheoriginalReptilemethodaswellasfirst-orderMAML(FOMAML)andaclassicallytrainedmodelofthesamearchitecture.We

testthesemethodsonasampleof100,0005-way1-shotand5-shotmini-ImageNettasks

?ReptilewithWeight-Clusteringachieveshigherperformance.

21

Resultsofweightclustering

?ParametersofnetworkstrainedusingourregularizedversionofReptiledonottravelasfarduringfine-tuningatinferenceasthosetrainedusingvanillaReptile

?Fromthese,weconcludethatourregularizerdoesindeedmovemo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論