騰訊KDD24公開廣告系統(tǒng)建模方案_第1頁
騰訊KDD24公開廣告系統(tǒng)建模方案_第2頁
騰訊KDD24公開廣告系統(tǒng)建模方案_第3頁
騰訊KDD24公開廣告系統(tǒng)建模方案_第4頁
騰訊KDD24公開廣告系統(tǒng)建模方案_第5頁
已閱讀5頁,還剩18頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

AdsRecommendationinaCollapsedandEntangledWorld

JunweiPan

TencentInc.

jonaspan@

WeiXue

TencentInc.

weixue@

XimeiWang

TencentInc.

messixmwang@

HaibinYu

TencentInc.

nathanhbyu@

XunLiu

TencentInc.

reubenliu@

ShijieQuan

TencentInc.

justinquan@

XuemingQiu

TencentInc.

arXiv:2403.00793v2[cs.IR]5Jul2024

xuemingqiu@

DapengLiu

TencentInc.

rocliu@

LeiXiao

TencentInc.

shawnxiao@

JieJiang

TencentInc.

zeus@

ABSTRACT

WepresentTencent’sadsrecommendationsystemandexaminethechallengesandpracticesoflearningappropriaterecommendationrepresentations.Ourstudybeginsbyshowcasingourapproachestopreservingpriorknowledgewhenencodingfeaturesofdiversetypesintoembeddingrepresentations.Wespecificallyaddressse-quencefeatures,numericfeatures,andpre-trainedembeddingfea-tures.Subsequently,wedelveintotwocrucialchallengesrelatedtofeaturerepresentation:thedimensionalcollapseofembeddingsandtheinterestentanglementacrossdifferenttasksorscenarios.Wepro-poseseveralpracticalapproachestoaddressthesechallengesthatresultinrobustanddisentangledrecommendationrepresentations.Wethenexploreseveraltrainingtechniquestofacilitatemodeloptimization,reducebias,andenhanceexploration.Additionally,weintroducethreeanalysistoolsthatenableustostudyfeaturecorrelation,dimensionalcollapse,andinterestentanglement.ThisworkbuildsuponthecontinuouseffortsofTencent’sadsrecom-mendationteamoverthepastdecade.Itsummarizesgeneraldesignprinciplesandpresentsaseriesofreadilyapplicablesolutionsandanalysistools.Thereportedperformanceisbasedonouronlinead-vertisingplatform,whichhandleshundredsofbillionsofrequestsdailyandservesmillionsofadstobillionsofusers.

CCSCONCEPTS

?Informationsystems→Displayadvertising;?Computingmethodologies→Neuralnetworks;Factorizationmethods.

Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Copyrightsforcomponentsofthisworkownedbyothersthantheauthor(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orrepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.Requestpermissionsfrompermissions@.

KDD’24,August25–29,2024,Barcelona,Spain

?2024Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM.ACMISBN979-8-4007-0490-1/24/08...$15.00

/10.1145/3637528.3671607

KEYWORDS

RecommendationSystems,RepresentationLearning,DimensionalCollapse,DisentangledLearning,UserInterestModeling

ACMReferenceFormat:

JunweiPan,WeiXue,XimeiWang,HaibinYu,XunLiu,ShijieQuan,Xuem-ingQiu,DapengLiu,LeiXiao,andJieJiang.2024.AdsRecommenda-tioninaCollapsedandEntangledWorld.InProceedingsofthe30thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD’24),August25–29,2024,Barcelona,Spain.ACM,NewYork,NY,USA,

12

pages.

/10.1145/3637528.3671607

1INTRODUCTION

Theonlineadvertisingindustry,valuedatbillionsofdollars,isaremarkableexampleofthesuccessfulapplicationofmachinelearning.Variousadvertisingformats,includingsponsoredsearchadvertising,contextualadvertising,displayadvertising,andmicro-videoadvertising,heavilyrelyontheaccurate,efficient,andreliablepredictionofadsclick-throughorconversionratesusinglearnedmodels.

Overthepastdecade,deeplearninghasachievedremarkablesuccessindiversedomains,includingcomputervision(CV)

[23,

34],

naturallanguageprocessing(NLP)

[1,

16,

62],andrecommendation

systems

[41,

77]

.Theeffectivenessofdeeplearningcriticallyde-pendsontheselectionofappropriatedatarepresentations

[3,

67,

68]

.Researchershaveextensivelyexploredvariousaspectsofrepresen-tationlearninginCVandNLP.Theseinvestigationshavefocusedontopicssuchaspriors

[62],smoothnessandthecurseofdimen

-sionality

[5],depthandabstraction

[4],disentanglingfactorsof

variations

[68],andtheuniformityofrepresentations

[28,

30]

.

Inthefieldofrecommendationsystems,numerousworkshavefo-cusedonrepresentationlearningtechniquestohandlevarioustypesoffeatures

[9,

19,

32,

79,

81],capturefeaturecorrelationsthrough

explicitorimplicitfeatureinteractions

[12,

20,

37,

45,

51,

60,

66],

addresstheentangledinterestwithinusers’complexbehaviors

[69],

particularlyinmulti-task

[43,

59]ormulti-scenario

[7,

54,

83]set

-tings,andenhancedatarepresentationthroughself-supervised

KDD’24,August25–29,2024,Barcelona,SpainJunwei,etal.

MLPs&ReLU

MLPs&ReLU

Interaction(e.g.,FlatDNN)

MLPs&ReLU

Interaction(e.g.,GwPFM)

G

MLPs&ReLU

Interaction(e.g.,DCNV2)

Expert3

Expert1

Expert2

EmbeddingTable1

EmbeddingTable2

Target-awareTemporalEncoding

One-HotEncoding

OrdinalEncoding

<

Target

<

SequenceFeaturesPre-TrainedEmbedding&SparseIDFeatures

NumericFeatures

Non-Linear

Transformation

FeatureInteraction

ActiveEmbeddings

MultipleEmbedding

Tables

FeatureEncoding

Classifier

Representation

Figure1:ArchitectureofourHeterogeneousMixture-of-ExpertswithMulti-Embeddingforsingle-tasklearning,whichconsistsoffourkey

modules:featureencoding,multi-embeddinglookup,experts(featureinteractionsandMLPs),andclassificationtowers.

learning

[63,

84]

.Despitetheprogressmadeintheserepresentation-orientedworks,severalfundamentalquestionsregardingrepresen-tationlearninginlarge-scalereal-worldadsrecommendersremainunanswered.

?PriorsforRepresentation:Real-worldsystemsencompassvar-ioustypesoffeaturesfromdiversesources,includingse-quencefeatures(e.g.,userclick/conversionhistory),numericfeatures(e.g.,semantic-preservingadIDs),andembeddingfeaturesfrompre-trainedexternalmodels(e.g.,GNNorLLM).Preservingtheinherentpriorsofthesefeatureswhenencod-ingtheminrecommendationsystemsiscrucial.

?DimensionalCollapse:Theencodingprocessmapsallfeaturesintoembeddings,typicallyrepresentedasK-dimensionalvectors,andarelearnedduringmodeltraining.However,weobservethattheembeddingsofmanyfieldstendtooc-cupyalower-dimensionalsubspaceinsteadoffullyutilizingtheavailableK-dimensionalspace.Suchdimensionalcol-lapsenotonlyleadstoparameterwastagebutalsolimitsthescalabilityofrecommendationmodels.

?InterestEntanglement:Userresponsesinadsrecommendersystemsaredeterminedbycomplexunderlyingfactors,par-ticularlywhenmultipletasksorscenariosarelearnedsimul-taneously.Existingshared-embeddingapproaches

[6,

43,

59]

mayfailtodisentanglethesefactorsadequately,astheyrelyonasingleentangledembeddingforeachfeature.

Thispaperpresentsourpracticesforaddressingthesechallenges.Theremainingsectionsofthepaperareorganizedasfollows:Sec-tion

2

providesanoverviewofourmodelarchitecture,givinga

high-levelunderstandingofthesystem.Section

3

focusesontheen-codingtechniquesusedtointegratetemporal,ordinal,anddistancepriorsofdifferentfeaturetypesintotherepresentation.Section

4

delvesintotherootcausesoftheembeddingdimensionalcollapseandproposesseveralsolutionstomitigatethisissue.Section

5

ex-ploresthechallengeofinterestentanglementacrossvarioustasksandscenariosandoursolutions.Section

6

presentsvariousmodeltrainingtechniques.Finally,Section

7

introducesasetofoff-the-shelftoolsdesignedtofacilitatetheanalysisoffeaturecorrelations,dimensionalcollapse,andinterestentanglement.Duetospacelim-itations,thispapercannotprovideadetaileddescriptionofeachapproach.Formorein-depthinformation,pleaserefertothecorre-spondingpapercitedineachsection.

2BRIEFSYSTEMOVERVIEW

Theoverallarchitectureofouradsrecommendationmodelforsingle-tasklearningisillustratedinFig.

1.

Forthemulti-tasklearn-ingmodelarchitecture,pleaserefertoFig.

4.

OurmodelfollowsthewidelyadoptedEmbedding&ExplicitInteractionframework

[41,

77],whichconsistsoffourkeymodules:featureencoding,multi

-embeddinglookup,experts(featureinteractionsandMLPs),andclassificationtowers.Inthefeatureencodingmodule,weapplyspecificencodingmethodstailoredtovariousfeaturetypesinoursystem.Next,basedontheencodedIDsobtainedfromthefeatureencodingmodule,multipleembeddingsarelookedupfromindivid-ualembeddingtablesforeachfeature.Withintheexpertmodule,embeddingsfromthesametableareexplicitlyinteractedwithoneanother.TheoutputsoftheexpertmodulearethenpassedthroughMulti-LayerPerceptrons(MLPs)withnon-lineartransformations.

AdsRecommendationinaCollapsedandEntangledWorldKDD’24,August25–29,2024,Barcelona,Spain

Theclassificationtowersreceivethegate-weightedsumoftheout-putsfromtheexperts.Finally,thesigmoidactivationfunctionisappliedtogeneratethefinalprediction.

Inthecaseofsingle-tasklearning,suchasClick-ThroughRate(CTR)prediction,ourmodelemploysasingletower,asdepictedinFig.

1.

However,inthecontextofmulti-tasklearning(MTL),suchasConversionRate(CVR)prediction,whereeachconversiontypeistreatedasanindividualtask

[48],ourmodelutilizesmultipletowers

andcorrespondinggates.Eachtowerisdedicatedtoaspecificgroupofconversiontypes,allowingfortask-specificpredictions.ToaddressthechallengeofinterestentanglementthatarisesintheMTLsetting,furtherevolutionofthemodelarchitecturetodisentangleuserinterestispresentedinSection

5.

Ourteamisresponsibleforadsrecommendationacrossallmod-ules,includingretrievalandpre-ranking,CTRprediction(pCTR),(shallow)conversionprediction(pCVR)ofvariousconversiontypes,deepconversionprediction(pDCVR),andLong-timeValuepredic-tion(pLTV).Therearemanycommonalitiesregardingthemodeldesignprincipleamongthesemodules,andwemainlydiscussthepCTRandpCVRasrepresentativemodulesforsingle-taskandmulti-tasklearning,respectively.Ourmodelsservevariousadsrec-ommendationscenarioswithinTencent,encompassingMoments(socialstream),Channels(micro-videostream),OfficialAccounts(subscription),TencentNews,TencentVideo(long-videoplatform),andDemandSidePlatform.

3FEATUREENCODING

Inindustrialadsrecommendationsystems,featuresaregeneratedfrommanysourcesandbelongtodifferenttypes,suchassequence,numeric,andembeddingfeatures.Whenencodingthesefeatures,we’dliketopreservetheirinherenttemporal,ordinal,ordistance(similarity)priorsasmuchaspossible.

3.1SequenceFeatures

Auser’shistorybehaviorsreflectherinterest,makingthemcriticalinrecommendations.Onekeycharacteristicofsuchfeaturesisthattherearestrongsemanticandtemporalcorrelationsbetweenthesebehaviorsandthetarget

[82]

.Forexample,givenatargetad,thosebehaviorsthatareeithersemanticallyrelated(e.g.,belongingtothesamecategorywiththetargetad)ortemporallyclosetothetargetaremoreinformativetopredicttheuser’sresponsetothetargetitem.

WeproposeTemporalInterestModule(TIM)

[82]tolearnthe

quadruplesemantic-temporalcorrelationbetween(behaviorseman-tic,targetsemantic,behaviortemporal,targettemporal).Specifically,inadditiontothesemanticencoding

[17,

80,

81],TIMleverages

Target-awareTemporalEncodingforeachbehavior,e.g.,therel-ativepositionortimeintervalbetweeneachbehaviorandtarget.Furthermore,tocapturethequadruplecorrelation,TIMemploysTarget-awareAttentionandTarget-awareRepresentationtointer-actbehaviorswiththetargetinbothattentionandrepresentation,

resultinginexplicit4-wayinteraction(showninFig.

2(a))

.Mathemat-ically,theencodingofuserbehaviorsequenceHcanbeformulatedas:

behavioriand?targett,(ei⊙ut)denotesthetarget-awarerepre-

whereα(i,t)denotes?thet?arget-awareattentionbetweeneach

sentation,andei=ei⊕pf(xi)denotesthetemporallyencodedembeddingofthei-thbehavior,whichisanelement-wisesumma-tionofsemanticembeddingeiandtarget-awaretemporalencodingpf(xi),i.e.,theembeddingofeithertherelativepositionofeachbe-

haviorregardingthetarget,orthediscretiz?edti?meinterval.Please

notethatthetarget-awarerepresentationei⊙utactslikeafeatureinteractionlayertoexplicitlyinteractthebehaviorfeaturewiththetarget,asdoneinotherFM-basedexplicitfeatureinteractionmodels

[31,

37,

49,

52,

66].Theimportanceofsuchexplicitbehavior

-targetinteractionintherepresentationwasalsoemphasizedinarecentwork

[74]

.

DeploymentDetails.Inpractice,weadoptbothrelativepositionandtimeintervalfortemporalencoding.TheoutputofTIMisconcatenatedwiththeoutputofthefeatureinteractionmodule,e.g.,DCNV2

[66]orGwPFM(willbediscussedlater)

.WeapplyTIMontheuser’sclick/conversioncategorysequencefeaturesinvariousclickandconversionpredictiontasksacrossmultiplescenarios.TIMbringsa1.93%GrossMerchandiseValue(GMV)liftinWeChatpCTRanda2.45%GMVliftinGameande-CommercepLTV.Weobservethemodellearnsmuchstrongerdecayingpatternsinthetimeintervalembeddingsthantherelativepositionembedding.Thisisbecauseusers’clicksonadsareprettysparse,makingtimeintervalsmoreinformativethanrelativepositions.

3.2NumericFeatures

UnlikeindependentIDfeatures,thereisinherentpartialorderbetweennumeric/ordinalfeatures,suchasAge_20?Age_30.Topreservetheseordinalpriors,weadoptasimplifiedvariantoftheNaryDisencoding

[9],namelytheMultipleNumeralSystemsEncod

-ing(MNSE).Itencodesnumericfeaturesbygettingcodesaccordingtomultiplenumeralsystems(i.e.,binary,ternary,decimal)andthenassignslearnableembeddingstothesecodes,asshowninFig.

2(b).

Forexample,afeaturevalue"51"istransformedintocode"{6_1,

5_1,4_0,3_0,2_1,1_1}"accordingtobinarysystem,and"{6_0,5_0,4_1,3_2,2_2,1_0}"accordingtoternarysystem.Allcodesareprojectedtoembeddingsandthensum-pooledtogetthefinalencodingresult.Toimprovecomputationefficiency,weremovetheinter-andintra-attentionintheoriginalNaryDis

[9]

.GivenacontinuousfeaturewithvalueU,theencodingresultis:

(2)

B=func_binary(U),C=func_ternary(U),...

whereX2(BkandX3(Ckaretheembeddingdictionariesforbi-

naryandternarysystemsrespectively,whoselengthsofencodingsareK2andK3.func_binaryandfunc_ternaryarethebinarizationandternarizationfunctionsthattransformthecontinuousfeatureUintotheircorrespondingencodings.

DeploymentDetails.Inanadvertisingsystem,adsareoftenin-dexedbydiscreteidentifiers(AdIDs),whichareself-incrementalorrandomandcontainlittleinformation.However,eachadisasso-ciatedwithacreativecontainingabundantvisualsemantics.We

KDD’24,August25–29,2024,Barcelona,SpainJunwei,etal.

α(i,v't),(i⊙v't)

Target-aware

Attention

α(i,v't)

Target-aware

Multiply

Representation

(i⊙v't)

Dot

Product

Element-wiseMultiply

i=ei⊕pf(xi)

t=vt⊕pf(xt)

Sum

Sum

Temporal

Encoding

Category/Relativepositionpiortimeinterval?i

TemporalEmbedding

CategoryIDEmbeddingItemIDEmbedding

Target-aware

itemID

Target

Behaviors

(a)TemporalInterestModule

SumPooling

SumPooling

101010101010210210210210210210

bit6bit5bit4bit3bit2

bit1

bit1

bit2

0

bit6bit5bit4bit3

0

1

2

0

2

Encoding

TernaryEncoding

110011Binary

OrdinalValue51

<<Similarity

NumericFeaturesEmbeddingFeatures(b)MultipleNumericSystemsEncoding

Figure2:IllustrationofTemporalInterestModule(left)forsequencefeaturesandMultipleNumeralSystemsEncoding(right)fornumericandpre-trainedembeddingfeatures.

replacetheself-incrementalorrandomAdIDswithnovelVisualSemanticIDstopreservethevisualsimilaritybetweenads.Weachievethisbyobtainingvisualembeddingsfromadimagesus-ingavisionmodelandapplyinghashingalgorithmslikeLocality-SensitiveHashing(LSH)

[64]topreservevisualsimilarity.

TheVisualSemanticIDsserveasnumericfeatures,andweapplyMin-imumNormScaling(MNS)topreservetheirordinalpriors.Thisreplacementleadstoa1.13%GMVliftinMomentspCVR,withalargerliftof1.74%fornewads.Additionally,thecoefficientofvari-ationinpredictionscoresamongsimilaradsexposedtothesameuserissignificantlyreducedfrom2.44%to0.30%,substantiatingthatourapproachcanpreservethevisualsimilaritypriors.

3.3EmbeddingFeatures

Besidesthemainrecommendationmodel,wemaytrainaseparatemodel,suchasLLMorGNN,tolearnembeddingsforentities(usersoritems).Suchembeddingscapturetherelationshipbetweenusersanditemsfromadifferentperspective,e.g.,aGraphModeloraSelf-SupervisedLanguageModel,andcanbetrainedonalargerordifferentdatasetandhenceshouldprovideextrainformationtotherecommendationmodels.Thekeychallengeinleveragingsuchpre-trainedembeddingdirectlyinourrecommendationsystemisthesemanticgapbetweentheembeddingspaceoftheexternalmodelsandtherecommenders.Thatis,theseembeddingcapturesdifferentsemanticsfromthecollaborativesemanticsoftheIDembeddingsinrecommendationmodels

[36,

79]

.

WeproposeaSimilarityEncodingEmbeddingapproachtomiti-gatesuchasemanticgap.TakeGNNforexample.OncewetrainaGNNmodelandgetthepre-trainedembeddingsu,iforeachuser-itempair(u,i),wefirstcalculatethesimilaritywsim(u,i)basedontheirGNNembeddingsusingthecorrespondingsimilarityfunction,i.e.,cosineinGraphSage

[22]

.Formally,

wsim(u,i)=sim(e-u,e-i).(3)

Suchasimilarityscoreisanordinalvalue.Hence,similartonumericfeatures,wecanusetheMultipleNumeralSystemsEncod-ingmentionedbeforetotransformitintoalearnableembeddingesim(u,i)=fMNS(wsim(u,i)).Afterthat,theencodedembeddingissimultaneouslyco-trainedwiththeotherIDembeddingsinrec-ommenders.Thus,thesimilaritypriorsintheoriginalspaceareretainedviathesimilarityscoreandencoding.Then,suchpriorsaretransferredtotherecommendersbyaligningthesimilarityencod-ingembeddingesim(u,i)withtherecommendationIDembeddings. Furthermore,suchanembeddingencodingstrategyhasalsobeendevelopedtoincorporatelarge-languagemodel(LLM)knowledgeintoourrecommendationsystem.AnLLMmodelisfirsttrans-formedintoanencoder-onlyarchitectureandtrainedwithbaseproxytaskslikenext-sentenceprediction.Throughsuchgeneralpre-training,theLLMencodercanencodesemanticembedding.

Afterthat,theLLMmodelisfinetunedwithhigh-qualitypositiveandnegativeuser-itempairsfromtheadsdomain.SuchcontrastivealignmentenablestheLLMtogeneratehigh-qualitypre-traineduserembeddingsuandadembeddingsi.WithsuchLLMsimi-laritypriors,likeGNNembeddings,wecanthenadoptSimilarityEncodingEmbeddingforspacealignment.

DeploymentDetails.WetrainaGraphSage

[22]uponauser

-ad/contentbipartitegraph,withclicksinbothadandcontentrec-ommendationdomainsastheedges.WethenadopttheSimilarityEncodingEmbeddingontheGNNembeddingsandconcatenatetheresultingrepresentationwiththatofthefeatureinteractionlayer.GNNembeddingsaresuccessfullydeployedinmanyscenar-ios,leadingto+1.21%,+0.59%,and1.47%GMVliftonMoments,Channel,andAppletpCTR.Inaddition,incorporatingLLMalsoleadsto+2.55%GMVliftonChannelpCVRand+1.41%GMVliftonChannelpCTRduringonlineA/Btest.

4TACKLINGDIMENSIONALCOLLAPSE

Afterencoding,allfeaturesaretransformedintoembeddingsandtheninteractwitheachotherexplicitlythroughFM-likemod-els

[20,

31,

33,

37,

49,

51,

52,

58,

66]

.However,onekeysideeffectof

AdsRecommendationinaCollapsedandEntangledWorldKDD’24,August25–29,2024,Barcelona,Spain

explicitfeatureinteractionisthatsomedimensionsofembeddingscollapse

[21]

.Inthissection,we’llfirstexplainthedimensionalcol-lapsephenomenonandthenpresenttwodifferentmulti-embeddingapproachesandacollapse-resilientfeatureinteractionfunctiontomitigateit.

4.1EmbeddingDimensionalCollapse

Recentwork

[1,

18,

78]hasdemonstratedthatlarge-scalemodels

especiallytransformer-basedmodelswithbillions,eventrillions,ofparameterscanachieveremarkableperformance(e.g.,GPT-4

[1],

LLaMA

[61])

.Inspiredbytheseworks,weexplorehowtoscaleupadsrecommendationmodels.Usually,embeddingsdominatethenumberofmodelparameters.Forexample,morethan99.99%ofparametersinourproductionmodelarefromfeatureembed-dings.Therefore,westarttoscaleupourmodelbyenlargingtheembeddingsizeK,e.g.,increasingKfrom64to192.However,itdoesn’tbringsignificantperformanceliftandsometimesevenleadstoperformancedeterioration.

Weinvestigatethelearnedembeddingmatrixofeachfieldbysingularspectralanalysis

[30],andobservedimensionalcollapse

.Thatis,manysingularvaluesareverysmall,indicatingthatem-beddingsofmanyfieldsendupspanningalower-dimensionalsubspaceinsteadoftheentireavailableembeddingspace

[21,

28]

.Thedimensionalcollapseofembeddingsresultsinavastwasteofmodelcapacitysincemanyembeddingdimensionscollapseandaremeaningless.Furthermore,thefactthatmanyembeddingshavealreadycollapsedmakesitinfeasibletoscaleupmodelsbysimplyincreasingdimensionlength

[2,

21]

.

Westudytherootcauseofthedimensionalcollapseandfindit’sduetotheexplicitfeatureinteractionmodule,namely,fieldswithcollapseddimensionmaketheembeddingsofotherfieldscollapse.Forexample,somefieldssuchasGenderhaveverylowcardinalityNGen,makingtheirembeddingsonlyabletospanaNGen-dimensionspace.AsNGenismuchsmallerthanembeddingsizeK,theinteractionbetweentheselow-dimensionembeddingsandthepossiblyhigh-dimensionalembedding(inK-dimensional)ofremainingfieldsmakethelattercollapsetoanNGen-dimensionalsubspace.

4.2Multi-EmbeddingParadigm

Weproposeamulti-embeddingparadigm

[21]tomitigateembed

-dingdimensionalcollapsewhenscalingupadsrecommenders.Specifically,foreachfeature,insteadoflookinguponlyoneembed-dingintheexistingsingle-embeddingparadigm,welearnmultipleembeddingtables,andlookupseveralembeddingsfromthesetableforeachfeature.Then,allfeatureembeddingsfromthesameem-beddingtableinteractwitheachotherinthecorrespondingexpertI.Formally,arecommendationmodelwithTembeddingtablesisdefinedas:

ei(t)=(Ei(t))?1xi,?i∈{1,2,...,N},

=F(h),

wheretstandsfortheindexoftheembeddingtable,gdenotesthegatingfunctionforeachexpert,andF(·)denotesthefinalclassifier.Onerequirementisthatthereshouldbenon-linearitiessuchasReLUwithintheinteractionexpertI;otherwise,themodelisequivalenttothesingle-embeddingparadigm

[21]

.AnoverallarchitectureisshowninFigure

1.

Themulti-embeddingparadigmoffersaneffectiveapproachtoscalinguprecommendationmodels.Insteadofsimplyincreasingthelengthofasharedembeddingforeachfeature,thisparadigminvolveslearningmultipleembeddingsforeachfeature.Byadoptingthemulti-embeddingparadigm,wecanachieveparameterscalingforrecommendationmodels,whichhastraditionallybeenachal-lengingtask

[2]:themodel’sperformanceimprovesasthenumber

ofparametersincreases.

DeploymentDetails.AlmostallpCTRmodelsinourplatformadopttheMulti-Embeddingparadigm.Specifically,welearnmulti-pledifferentfeatureinteractionexperts,e.g.,GwPFM(avariantofFFM,whichwillbedescribedbelow),IPNN,DCNV2,orFlatDNN,andmultipleembeddingtables.Oneorseveralexpertsshareoneoftheseembeddingtables.WenamesucharchitectureHeteroge-neousMixture-of-ExpertswithMulti-Embedding,whichdiffersfromDHEN

[75]inthesensethat

[75]employsonesharedembedding

tablewhilewedeploymultipleones.Forexample,theMomentspCTRmodelconsistsofaGwPFM,IPNN

[51],FlatDNN,andtwo

embeddingtables.GwPFMandFlatDNNsharethefirsttable,whileIPNNusesthesecondone.Switchingfromasingleembeddingtotheabovearchitecturebringsa3.9%GMVliftinMomentspCTR,whichisoneofthelargestperformanceliftsduringthepastdecade.

4.3GwPFM:YetAnotherSimplifiedApproachtoMulti-EmbeddingParadi

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論