版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
AdsRecommendationinaCollapsedandEntangledWorld
JunweiPan
TencentInc.
jonaspan@
WeiXue
TencentInc.
weixue@
XimeiWang
TencentInc.
messixmwang@
HaibinYu
TencentInc.
nathanhbyu@
XunLiu
TencentInc.
reubenliu@
ShijieQuan
TencentInc.
justinquan@
XuemingQiu
TencentInc.
arXiv:2403.00793v2[cs.IR]5Jul2024
xuemingqiu@
DapengLiu
TencentInc.
rocliu@
LeiXiao
TencentInc.
shawnxiao@
JieJiang
TencentInc.
zeus@
ABSTRACT
WepresentTencent’sadsrecommendationsystemandexaminethechallengesandpracticesoflearningappropriaterecommendationrepresentations.Ourstudybeginsbyshowcasingourapproachestopreservingpriorknowledgewhenencodingfeaturesofdiversetypesintoembeddingrepresentations.Wespecificallyaddressse-quencefeatures,numericfeatures,andpre-trainedembeddingfea-tures.Subsequently,wedelveintotwocrucialchallengesrelatedtofeaturerepresentation:thedimensionalcollapseofembeddingsandtheinterestentanglementacrossdifferenttasksorscenarios.Wepro-poseseveralpracticalapproachestoaddressthesechallengesthatresultinrobustanddisentangledrecommendationrepresentations.Wethenexploreseveraltrainingtechniquestofacilitatemodeloptimization,reducebias,andenhanceexploration.Additionally,weintroducethreeanalysistoolsthatenableustostudyfeaturecorrelation,dimensionalcollapse,andinterestentanglement.ThisworkbuildsuponthecontinuouseffortsofTencent’sadsrecom-mendationteamoverthepastdecade.Itsummarizesgeneraldesignprinciplesandpresentsaseriesofreadilyapplicablesolutionsandanalysistools.Thereportedperformanceisbasedonouronlinead-vertisingplatform,whichhandleshundredsofbillionsofrequestsdailyandservesmillionsofadstobillionsofusers.
CCSCONCEPTS
?Informationsystems→Displayadvertising;?Computingmethodologies→Neuralnetworks;Factorizationmethods.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Copyrightsforcomponentsofthisworkownedbyothersthantheauthor(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orrepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.Requestpermissionsfrompermissions@.
KDD’24,August25–29,2024,Barcelona,Spain
?2024Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM.ACMISBN979-8-4007-0490-1/24/08...$15.00
/10.1145/3637528.3671607
KEYWORDS
RecommendationSystems,RepresentationLearning,DimensionalCollapse,DisentangledLearning,UserInterestModeling
ACMReferenceFormat:
JunweiPan,WeiXue,XimeiWang,HaibinYu,XunLiu,ShijieQuan,Xuem-ingQiu,DapengLiu,LeiXiao,andJieJiang.2024.AdsRecommenda-tioninaCollapsedandEntangledWorld.InProceedingsofthe30thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD’24),August25–29,2024,Barcelona,Spain.ACM,NewYork,NY,USA,
12
pages.
/10.1145/3637528.3671607
1INTRODUCTION
Theonlineadvertisingindustry,valuedatbillionsofdollars,isaremarkableexampleofthesuccessfulapplicationofmachinelearning.Variousadvertisingformats,includingsponsoredsearchadvertising,contextualadvertising,displayadvertising,andmicro-videoadvertising,heavilyrelyontheaccurate,efficient,andreliablepredictionofadsclick-throughorconversionratesusinglearnedmodels.
Overthepastdecade,deeplearninghasachievedremarkablesuccessindiversedomains,includingcomputervision(CV)
[23,
34],
naturallanguageprocessing(NLP)
[1,
16,
62],andrecommendation
systems
[41,
77]
.Theeffectivenessofdeeplearningcriticallyde-pendsontheselectionofappropriatedatarepresentations
[3,
67,
68]
.Researchershaveextensivelyexploredvariousaspectsofrepresen-tationlearninginCVandNLP.Theseinvestigationshavefocusedontopicssuchaspriors
[62],smoothnessandthecurseofdimen
-sionality
[5],depthandabstraction
[4],disentanglingfactorsof
variations
[68],andtheuniformityofrepresentations
[28,
30]
.
Inthefieldofrecommendationsystems,numerousworkshavefo-cusedonrepresentationlearningtechniquestohandlevarioustypesoffeatures
[9,
19,
32,
79,
81],capturefeaturecorrelationsthrough
explicitorimplicitfeatureinteractions
[12,
20,
37,
45,
51,
60,
66],
addresstheentangledinterestwithinusers’complexbehaviors
[69],
particularlyinmulti-task
[43,
59]ormulti-scenario
[7,
54,
83]set
-tings,andenhancedatarepresentationthroughself-supervised
KDD’24,August25–29,2024,Barcelona,SpainJunwei,etal.
MLPs&ReLU
MLPs&ReLU
Interaction(e.g.,FlatDNN)
MLPs&ReLU
Interaction(e.g.,GwPFM)
G
MLPs&ReLU
Interaction(e.g.,DCNV2)
Expert3
Expert1
Expert2
EmbeddingTable1
EmbeddingTable2
Target-awareTemporalEncoding
One-HotEncoding
OrdinalEncoding
<
Target
<
SequenceFeaturesPre-TrainedEmbedding&SparseIDFeatures
NumericFeatures
Non-Linear
Transformation
FeatureInteraction
ActiveEmbeddings
MultipleEmbedding
Tables
FeatureEncoding
Classifier
Representation
Figure1:ArchitectureofourHeterogeneousMixture-of-ExpertswithMulti-Embeddingforsingle-tasklearning,whichconsistsoffourkey
modules:featureencoding,multi-embeddinglookup,experts(featureinteractionsandMLPs),andclassificationtowers.
learning
[63,
84]
.Despitetheprogressmadeintheserepresentation-orientedworks,severalfundamentalquestionsregardingrepresen-tationlearninginlarge-scalereal-worldadsrecommendersremainunanswered.
?PriorsforRepresentation:Real-worldsystemsencompassvar-ioustypesoffeaturesfromdiversesources,includingse-quencefeatures(e.g.,userclick/conversionhistory),numericfeatures(e.g.,semantic-preservingadIDs),andembeddingfeaturesfrompre-trainedexternalmodels(e.g.,GNNorLLM).Preservingtheinherentpriorsofthesefeatureswhenencod-ingtheminrecommendationsystemsiscrucial.
?DimensionalCollapse:Theencodingprocessmapsallfeaturesintoembeddings,typicallyrepresentedasK-dimensionalvectors,andarelearnedduringmodeltraining.However,weobservethattheembeddingsofmanyfieldstendtooc-cupyalower-dimensionalsubspaceinsteadoffullyutilizingtheavailableK-dimensionalspace.Suchdimensionalcol-lapsenotonlyleadstoparameterwastagebutalsolimitsthescalabilityofrecommendationmodels.
?InterestEntanglement:Userresponsesinadsrecommendersystemsaredeterminedbycomplexunderlyingfactors,par-ticularlywhenmultipletasksorscenariosarelearnedsimul-taneously.Existingshared-embeddingapproaches
[6,
43,
59]
mayfailtodisentanglethesefactorsadequately,astheyrelyonasingleentangledembeddingforeachfeature.
Thispaperpresentsourpracticesforaddressingthesechallenges.Theremainingsectionsofthepaperareorganizedasfollows:Sec-tion
2
providesanoverviewofourmodelarchitecture,givinga
high-levelunderstandingofthesystem.Section
3
focusesontheen-codingtechniquesusedtointegratetemporal,ordinal,anddistancepriorsofdifferentfeaturetypesintotherepresentation.Section
4
delvesintotherootcausesoftheembeddingdimensionalcollapseandproposesseveralsolutionstomitigatethisissue.Section
5
ex-ploresthechallengeofinterestentanglementacrossvarioustasksandscenariosandoursolutions.Section
6
presentsvariousmodeltrainingtechniques.Finally,Section
7
introducesasetofoff-the-shelftoolsdesignedtofacilitatetheanalysisoffeaturecorrelations,dimensionalcollapse,andinterestentanglement.Duetospacelim-itations,thispapercannotprovideadetaileddescriptionofeachapproach.Formorein-depthinformation,pleaserefertothecorre-spondingpapercitedineachsection.
2BRIEFSYSTEMOVERVIEW
Theoverallarchitectureofouradsrecommendationmodelforsingle-tasklearningisillustratedinFig.
1.
Forthemulti-tasklearn-ingmodelarchitecture,pleaserefertoFig.
4.
OurmodelfollowsthewidelyadoptedEmbedding&ExplicitInteractionframework
[41,
77],whichconsistsoffourkeymodules:featureencoding,multi
-embeddinglookup,experts(featureinteractionsandMLPs),andclassificationtowers.Inthefeatureencodingmodule,weapplyspecificencodingmethodstailoredtovariousfeaturetypesinoursystem.Next,basedontheencodedIDsobtainedfromthefeatureencodingmodule,multipleembeddingsarelookedupfromindivid-ualembeddingtablesforeachfeature.Withintheexpertmodule,embeddingsfromthesametableareexplicitlyinteractedwithoneanother.TheoutputsoftheexpertmodulearethenpassedthroughMulti-LayerPerceptrons(MLPs)withnon-lineartransformations.
AdsRecommendationinaCollapsedandEntangledWorldKDD’24,August25–29,2024,Barcelona,Spain
Theclassificationtowersreceivethegate-weightedsumoftheout-putsfromtheexperts.Finally,thesigmoidactivationfunctionisappliedtogeneratethefinalprediction.
Inthecaseofsingle-tasklearning,suchasClick-ThroughRate(CTR)prediction,ourmodelemploysasingletower,asdepictedinFig.
1.
However,inthecontextofmulti-tasklearning(MTL),suchasConversionRate(CVR)prediction,whereeachconversiontypeistreatedasanindividualtask
[48],ourmodelutilizesmultipletowers
andcorrespondinggates.Eachtowerisdedicatedtoaspecificgroupofconversiontypes,allowingfortask-specificpredictions.ToaddressthechallengeofinterestentanglementthatarisesintheMTLsetting,furtherevolutionofthemodelarchitecturetodisentangleuserinterestispresentedinSection
5.
Ourteamisresponsibleforadsrecommendationacrossallmod-ules,includingretrievalandpre-ranking,CTRprediction(pCTR),(shallow)conversionprediction(pCVR)ofvariousconversiontypes,deepconversionprediction(pDCVR),andLong-timeValuepredic-tion(pLTV).Therearemanycommonalitiesregardingthemodeldesignprincipleamongthesemodules,andwemainlydiscussthepCTRandpCVRasrepresentativemodulesforsingle-taskandmulti-tasklearning,respectively.Ourmodelsservevariousadsrec-ommendationscenarioswithinTencent,encompassingMoments(socialstream),Channels(micro-videostream),OfficialAccounts(subscription),TencentNews,TencentVideo(long-videoplatform),andDemandSidePlatform.
3FEATUREENCODING
Inindustrialadsrecommendationsystems,featuresaregeneratedfrommanysourcesandbelongtodifferenttypes,suchassequence,numeric,andembeddingfeatures.Whenencodingthesefeatures,we’dliketopreservetheirinherenttemporal,ordinal,ordistance(similarity)priorsasmuchaspossible.
3.1SequenceFeatures
Auser’shistorybehaviorsreflectherinterest,makingthemcriticalinrecommendations.Onekeycharacteristicofsuchfeaturesisthattherearestrongsemanticandtemporalcorrelationsbetweenthesebehaviorsandthetarget
[82]
.Forexample,givenatargetad,thosebehaviorsthatareeithersemanticallyrelated(e.g.,belongingtothesamecategorywiththetargetad)ortemporallyclosetothetargetaremoreinformativetopredicttheuser’sresponsetothetargetitem.
WeproposeTemporalInterestModule(TIM)
[82]tolearnthe
quadruplesemantic-temporalcorrelationbetween(behaviorseman-tic,targetsemantic,behaviortemporal,targettemporal).Specifically,inadditiontothesemanticencoding
[17,
80,
81],TIMleverages
Target-awareTemporalEncodingforeachbehavior,e.g.,therel-ativepositionortimeintervalbetweeneachbehaviorandtarget.Furthermore,tocapturethequadruplecorrelation,TIMemploysTarget-awareAttentionandTarget-awareRepresentationtointer-actbehaviorswiththetargetinbothattentionandrepresentation,
resultinginexplicit4-wayinteraction(showninFig.
2(a))
.Mathemat-ically,theencodingofuserbehaviorsequenceHcanbeformulatedas:
behavioriand?targett,(ei⊙ut)denotesthetarget-awarerepre-
whereα(i,t)denotes?thet?arget-awareattentionbetweeneach
sentation,andei=ei⊕pf(xi)denotesthetemporallyencodedembeddingofthei-thbehavior,whichisanelement-wisesumma-tionofsemanticembeddingeiandtarget-awaretemporalencodingpf(xi),i.e.,theembeddingofeithertherelativepositionofeachbe-
haviorregardingthetarget,orthediscretiz?edti?meinterval.Please
notethatthetarget-awarerepresentationei⊙utactslikeafeatureinteractionlayertoexplicitlyinteractthebehaviorfeaturewiththetarget,asdoneinotherFM-basedexplicitfeatureinteractionmodels
[31,
37,
49,
52,
66].Theimportanceofsuchexplicitbehavior
-targetinteractionintherepresentationwasalsoemphasizedinarecentwork
[74]
.
DeploymentDetails.Inpractice,weadoptbothrelativepositionandtimeintervalfortemporalencoding.TheoutputofTIMisconcatenatedwiththeoutputofthefeatureinteractionmodule,e.g.,DCNV2
[66]orGwPFM(willbediscussedlater)
.WeapplyTIMontheuser’sclick/conversioncategorysequencefeaturesinvariousclickandconversionpredictiontasksacrossmultiplescenarios.TIMbringsa1.93%GrossMerchandiseValue(GMV)liftinWeChatpCTRanda2.45%GMVliftinGameande-CommercepLTV.Weobservethemodellearnsmuchstrongerdecayingpatternsinthetimeintervalembeddingsthantherelativepositionembedding.Thisisbecauseusers’clicksonadsareprettysparse,makingtimeintervalsmoreinformativethanrelativepositions.
3.2NumericFeatures
UnlikeindependentIDfeatures,thereisinherentpartialorderbetweennumeric/ordinalfeatures,suchasAge_20?Age_30.Topreservetheseordinalpriors,weadoptasimplifiedvariantoftheNaryDisencoding
[9],namelytheMultipleNumeralSystemsEncod
-ing(MNSE).Itencodesnumericfeaturesbygettingcodesaccordingtomultiplenumeralsystems(i.e.,binary,ternary,decimal)andthenassignslearnableembeddingstothesecodes,asshowninFig.
2(b).
Forexample,afeaturevalue"51"istransformedintocode"{6_1,
5_1,4_0,3_0,2_1,1_1}"accordingtobinarysystem,and"{6_0,5_0,4_1,3_2,2_2,1_0}"accordingtoternarysystem.Allcodesareprojectedtoembeddingsandthensum-pooledtogetthefinalencodingresult.Toimprovecomputationefficiency,weremovetheinter-andintra-attentionintheoriginalNaryDis
[9]
.GivenacontinuousfeaturewithvalueU,theencodingresultis:
(2)
B=func_binary(U),C=func_ternary(U),...
whereX2(BkandX3(Ckaretheembeddingdictionariesforbi-
naryandternarysystemsrespectively,whoselengthsofencodingsareK2andK3.func_binaryandfunc_ternaryarethebinarizationandternarizationfunctionsthattransformthecontinuousfeatureUintotheircorrespondingencodings.
DeploymentDetails.Inanadvertisingsystem,adsareoftenin-dexedbydiscreteidentifiers(AdIDs),whichareself-incrementalorrandomandcontainlittleinformation.However,eachadisasso-ciatedwithacreativecontainingabundantvisualsemantics.We
KDD’24,August25–29,2024,Barcelona,SpainJunwei,etal.
α(i,v't),(i⊙v't)
Target-aware
Attention
α(i,v't)
Target-aware
Multiply
Representation
(i⊙v't)
Dot
Product
Element-wiseMultiply
i=ei⊕pf(xi)
t=vt⊕pf(xt)
Sum
Sum
Temporal
Encoding
Category/Relativepositionpiortimeinterval?i
TemporalEmbedding
CategoryIDEmbeddingItemIDEmbedding
Target-aware
itemID
Target
Behaviors
(a)TemporalInterestModule
SumPooling
SumPooling
101010101010210210210210210210
bit6bit5bit4bit3bit2
bit1
bit1
bit2
0
bit6bit5bit4bit3
0
1
2
0
2
Encoding
TernaryEncoding
110011Binary
OrdinalValue51
<<Similarity
NumericFeaturesEmbeddingFeatures(b)MultipleNumericSystemsEncoding
Figure2:IllustrationofTemporalInterestModule(left)forsequencefeaturesandMultipleNumeralSystemsEncoding(right)fornumericandpre-trainedembeddingfeatures.
replacetheself-incrementalorrandomAdIDswithnovelVisualSemanticIDstopreservethevisualsimilaritybetweenads.Weachievethisbyobtainingvisualembeddingsfromadimagesus-ingavisionmodelandapplyinghashingalgorithmslikeLocality-SensitiveHashing(LSH)
[64]topreservevisualsimilarity.
TheVisualSemanticIDsserveasnumericfeatures,andweapplyMin-imumNormScaling(MNS)topreservetheirordinalpriors.Thisreplacementleadstoa1.13%GMVliftinMomentspCVR,withalargerliftof1.74%fornewads.Additionally,thecoefficientofvari-ationinpredictionscoresamongsimilaradsexposedtothesameuserissignificantlyreducedfrom2.44%to0.30%,substantiatingthatourapproachcanpreservethevisualsimilaritypriors.
3.3EmbeddingFeatures
Besidesthemainrecommendationmodel,wemaytrainaseparatemodel,suchasLLMorGNN,tolearnembeddingsforentities(usersoritems).Suchembeddingscapturetherelationshipbetweenusersanditemsfromadifferentperspective,e.g.,aGraphModeloraSelf-SupervisedLanguageModel,andcanbetrainedonalargerordifferentdatasetandhenceshouldprovideextrainformationtotherecommendationmodels.Thekeychallengeinleveragingsuchpre-trainedembeddingdirectlyinourrecommendationsystemisthesemanticgapbetweentheembeddingspaceoftheexternalmodelsandtherecommenders.Thatis,theseembeddingcapturesdifferentsemanticsfromthecollaborativesemanticsoftheIDembeddingsinrecommendationmodels
[36,
79]
.
WeproposeaSimilarityEncodingEmbeddingapproachtomiti-gatesuchasemanticgap.TakeGNNforexample.OncewetrainaGNNmodelandgetthepre-trainedembeddingsu,iforeachuser-itempair(u,i),wefirstcalculatethesimilaritywsim(u,i)basedontheirGNNembeddingsusingthecorrespondingsimilarityfunction,i.e.,cosineinGraphSage
[22]
.Formally,
wsim(u,i)=sim(e-u,e-i).(3)
Suchasimilarityscoreisanordinalvalue.Hence,similartonumericfeatures,wecanusetheMultipleNumeralSystemsEncod-ingmentionedbeforetotransformitintoalearnableembeddingesim(u,i)=fMNS(wsim(u,i)).Afterthat,theencodedembeddingissimultaneouslyco-trainedwiththeotherIDembeddingsinrec-ommenders.Thus,thesimilaritypriorsintheoriginalspaceareretainedviathesimilarityscoreandencoding.Then,suchpriorsaretransferredtotherecommendersbyaligningthesimilarityencod-ingembeddingesim(u,i)withtherecommendationIDembeddings. Furthermore,suchanembeddingencodingstrategyhasalsobeendevelopedtoincorporatelarge-languagemodel(LLM)knowledgeintoourrecommendationsystem.AnLLMmodelisfirsttrans-formedintoanencoder-onlyarchitectureandtrainedwithbaseproxytaskslikenext-sentenceprediction.Throughsuchgeneralpre-training,theLLMencodercanencodesemanticembedding.
Afterthat,theLLMmodelisfinetunedwithhigh-qualitypositiveandnegativeuser-itempairsfromtheadsdomain.SuchcontrastivealignmentenablestheLLMtogeneratehigh-qualitypre-traineduserembeddingsuandadembeddingsi.WithsuchLLMsimi-laritypriors,likeGNNembeddings,wecanthenadoptSimilarityEncodingEmbeddingforspacealignment.
DeploymentDetails.WetrainaGraphSage
[22]uponauser
-ad/contentbipartitegraph,withclicksinbothadandcontentrec-ommendationdomainsastheedges.WethenadopttheSimilarityEncodingEmbeddingontheGNNembeddingsandconcatenatetheresultingrepresentationwiththatofthefeatureinteractionlayer.GNNembeddingsaresuccessfullydeployedinmanyscenar-ios,leadingto+1.21%,+0.59%,and1.47%GMVliftonMoments,Channel,andAppletpCTR.Inaddition,incorporatingLLMalsoleadsto+2.55%GMVliftonChannelpCVRand+1.41%GMVliftonChannelpCTRduringonlineA/Btest.
4TACKLINGDIMENSIONALCOLLAPSE
Afterencoding,allfeaturesaretransformedintoembeddingsandtheninteractwitheachotherexplicitlythroughFM-likemod-els
[20,
31,
33,
37,
49,
51,
52,
58,
66]
.However,onekeysideeffectof
AdsRecommendationinaCollapsedandEntangledWorldKDD’24,August25–29,2024,Barcelona,Spain
explicitfeatureinteractionisthatsomedimensionsofembeddingscollapse
[21]
.Inthissection,we’llfirstexplainthedimensionalcol-lapsephenomenonandthenpresenttwodifferentmulti-embeddingapproachesandacollapse-resilientfeatureinteractionfunctiontomitigateit.
4.1EmbeddingDimensionalCollapse
Recentwork
[1,
18,
78]hasdemonstratedthatlarge-scalemodels
especiallytransformer-basedmodelswithbillions,eventrillions,ofparameterscanachieveremarkableperformance(e.g.,GPT-4
[1],
LLaMA
[61])
.Inspiredbytheseworks,weexplorehowtoscaleupadsrecommendationmodels.Usually,embeddingsdominatethenumberofmodelparameters.Forexample,morethan99.99%ofparametersinourproductionmodelarefromfeatureembed-dings.Therefore,westarttoscaleupourmodelbyenlargingtheembeddingsizeK,e.g.,increasingKfrom64to192.However,itdoesn’tbringsignificantperformanceliftandsometimesevenleadstoperformancedeterioration.
Weinvestigatethelearnedembeddingmatrixofeachfieldbysingularspectralanalysis
[30],andobservedimensionalcollapse
.Thatis,manysingularvaluesareverysmall,indicatingthatem-beddingsofmanyfieldsendupspanningalower-dimensionalsubspaceinsteadoftheentireavailableembeddingspace
[21,
28]
.Thedimensionalcollapseofembeddingsresultsinavastwasteofmodelcapacitysincemanyembeddingdimensionscollapseandaremeaningless.Furthermore,thefactthatmanyembeddingshavealreadycollapsedmakesitinfeasibletoscaleupmodelsbysimplyincreasingdimensionlength
[2,
21]
.
Westudytherootcauseofthedimensionalcollapseandfindit’sduetotheexplicitfeatureinteractionmodule,namely,fieldswithcollapseddimensionmaketheembeddingsofotherfieldscollapse.Forexample,somefieldssuchasGenderhaveverylowcardinalityNGen,makingtheirembeddingsonlyabletospanaNGen-dimensionspace.AsNGenismuchsmallerthanembeddingsizeK,theinteractionbetweentheselow-dimensionembeddingsandthepossiblyhigh-dimensionalembedding(inK-dimensional)ofremainingfieldsmakethelattercollapsetoanNGen-dimensionalsubspace.
4.2Multi-EmbeddingParadigm
Weproposeamulti-embeddingparadigm
[21]tomitigateembed
-dingdimensionalcollapsewhenscalingupadsrecommenders.Specifically,foreachfeature,insteadoflookinguponlyoneembed-dingintheexistingsingle-embeddingparadigm,welearnmultipleembeddingtables,andlookupseveralembeddingsfromthesetableforeachfeature.Then,allfeatureembeddingsfromthesameem-beddingtableinteractwitheachotherinthecorrespondingexpertI.Formally,arecommendationmodelwithTembeddingtablesisdefinedas:
ei(t)=(Ei(t))?1xi,?i∈{1,2,...,N},
=F(h),
wheretstandsfortheindexoftheembeddingtable,gdenotesthegatingfunctionforeachexpert,andF(·)denotesthefinalclassifier.Onerequirementisthatthereshouldbenon-linearitiessuchasReLUwithintheinteractionexpertI;otherwise,themodelisequivalenttothesingle-embeddingparadigm
[21]
.AnoverallarchitectureisshowninFigure
1.
Themulti-embeddingparadigmoffersaneffectiveapproachtoscalinguprecommendationmodels.Insteadofsimplyincreasingthelengthofasharedembeddingforeachfeature,thisparadigminvolveslearningmultipleembeddingsforeachfeature.Byadoptingthemulti-embeddingparadigm,wecanachieveparameterscalingforrecommendationmodels,whichhastraditionallybeenachal-lengingtask
[2]:themodel’sperformanceimprovesasthenumber
ofparametersincreases.
DeploymentDetails.AlmostallpCTRmodelsinourplatformadopttheMulti-Embeddingparadigm.Specifically,welearnmulti-pledifferentfeatureinteractionexperts,e.g.,GwPFM(avariantofFFM,whichwillbedescribedbelow),IPNN,DCNV2,orFlatDNN,andmultipleembeddingtables.Oneorseveralexpertsshareoneoftheseembeddingtables.WenamesucharchitectureHeteroge-neousMixture-of-ExpertswithMulti-Embedding,whichdiffersfromDHEN
[75]inthesensethat
[75]employsonesharedembedding
tablewhilewedeploymultipleones.Forexample,theMomentspCTRmodelconsistsofaGwPFM,IPNN
[51],FlatDNN,andtwo
embeddingtables.GwPFMandFlatDNNsharethefirsttable,whileIPNNusesthesecondone.Switchingfromasingleembeddingtotheabovearchitecturebringsa3.9%GMVliftinMomentspCTR,whichisoneofthelargestperformanceliftsduringthepastdecade.
4.3GwPFM:YetAnotherSimplifiedApproachtoMulti-EmbeddingParadi
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 腫瘤微創(chuàng)治療醫(yī)療器械行業(yè)市場調(diào)研分析報告
- 智能冷鏈物流行業(yè)市場調(diào)研分析報告
- 專業(yè)醫(yī)療圖書出版行業(yè)風險投資態(tài)勢及投融資策略指引報告
- 清潔燃料與廢物利用行業(yè)三年發(fā)展預測分析報告
- 新興市場跨境電商行業(yè)相關(guān)項目經(jīng)營管理報告
- 小眾市場電商行業(yè)相關(guān)項目診斷報告
- 《 帶有偽故障的Geo-Geo-1離散時間排隊系統(tǒng)的研究》范文
- 2023年浙江省交通投資集團有限公司招聘考試試題及答案
- 2023年物產(chǎn)中大(金華)物流有限公司招聘筆試真題
- 河南省五岳在線考試2025屆高考物理一模試卷含解析
- GB/T 25070-2019信息安全技術(shù)網(wǎng)絡安全等級保護安全設(shè)計技術(shù)要求
- GB/T 1410-2006固體絕緣材料體積電阻率和表面電阻率試驗方法
- 水上交通組織安全警戒施工方案
- 英語語言的發(fā)展史(課堂PPT)
- 社會工作實務(初級)課件
- 醫(yī)院會議管理制度
- 浙江省高三通用技術(shù)三視圖專題練習
- 老年常見消化系統(tǒng)疾病-課件
- 農(nóng)業(yè)循環(huán)經(jīng)濟示范區(qū)項目可行性研究報告
- 少年宮微機組活動記錄表
- 《秋興八首(其一)》課件34張
評論
0/150
提交評論