云計(jì)算時(shí)代的社交網(wǎng)絡(luò)平臺(tái)和技術(shù)Google-NanChang-Talk_第1頁(yè)
云計(jì)算時(shí)代的社交網(wǎng)絡(luò)平臺(tái)和技術(shù)Google-NanChang-Talk_第2頁(yè)
云計(jì)算時(shí)代的社交網(wǎng)絡(luò)平臺(tái)和技術(shù)Google-NanChang-Talk_第3頁(yè)
云計(jì)算時(shí)代的社交網(wǎng)絡(luò)平臺(tái)和技術(shù)Google-NanChang-Talk_第4頁(yè)
云計(jì)算時(shí)代的社交網(wǎng)絡(luò)平臺(tái)和技術(shù)Google-NanChang-Talk_第5頁(yè)
已閱讀5頁(yè),還剩53頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

云計(jì)算時(shí)代的社交網(wǎng)絡(luò)

平臺(tái)和技術(shù)

張智威副院長(zhǎng),研究院,谷歌中國(guó)教授,電機(jī)工程系,加州大學(xué)10/30/20231EdChang180million(↑25%)208million(↑

3%)60million(↑

90%)60million(↑

29%)500million180million600kEngineeringGraduatesMobilePhonesBroadbandUsersInternetPopulationChinaU.S.ChinaOpportunity

China&USin2006-0772k7200010/30/20232EdChangGoogleChinaSize(~700)200engineers400otheremployeesAlmost100internsLocationsBeijing(2005)Taipei(2006)Shanghai(2007)10/30/20233EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/20234EdChangWeb1.0.htm.htm.htm.jpg.jpg.doc.htm.msg.htm.htm10/30/20235EdChangWebwithPeople(2.0).htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htm10/30/20236EdChang+SocialPlatforms.htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htmApp(Gadget)App(Gadget)10/30/20237EdChang10/30/20238EdChang10/30/20239EdChang10/30/202310EdChang10/30/202311EdChang開(kāi)放社區(qū)平臺(tái)10/30/202312EdChang10/30/202313EdChang10/30/202314EdChang10/30/202315EdChang10/30/202316EdChang開(kāi)放社區(qū)平臺(tái)社區(qū)平臺(tái)1我是誰(shuí)2我的朋友3他的活動(dòng)10/30/202317EdChang10/30/202318EdChang10/30/202319EdChang開(kāi)放社區(qū)平臺(tái)社區(qū)平臺(tái)1我是誰(shuí)2我的朋友3他的活動(dòng)4他的東西10/30/202320EdChang10/30/202321EdChangSocialGraph10/30/202322EdChang10/30/202323EdChangWhatUsersWant?PeoplecareaboutotherpeoplecareaboutpeopletheyknowconnecttopeopletheydonotknowDiscoverinterestinginformationbasedonotherpeopleaboutwhootherpeopleareaboutwhatotherpeoplearedoing10/30/202324EdChangInformationOverflowChallengeToomanypeople,toomanychoicesofforumsandapps“Isoonneedtohireafull-timetomanagemyonlinesocialnetworks”DesiringaSocialNetworkRecommendationSystem10/30/202325EdChangRecommendationSystemFriendRecommendationCommunity/ForumRecommendationApplicationSuggestionAdsMatching10/30/202326EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/202327EdChangpicturesource:http://

(1)數(shù)據(jù)在云端不怕丟失不必備份(2)軟件在云端不必下載自動(dòng)升級(jí)(3)無(wú)所不在的云計(jì)算任何設(shè)備登錄后就是你的(4)無(wú)限強(qiáng)大的云計(jì)算無(wú)限空間無(wú)限速度業(yè)界趨勢(shì):云計(jì)算時(shí)代的到來(lái)10/30/202328EdChang互聯(lián)網(wǎng)搜索:云計(jì)算的例子1.用戶輸入查詢關(guān)鍵字CloudComputing2.分布式預(yù)處理數(shù)據(jù)以便為搜索提供服務(wù):GoogleInfrastructure(thousandsofcommodityserversaroundtheworld)MapReduceformassdataprocessingGoogleFileSystem3.返回搜索結(jié)果10/30/202329EdChangGivenamatrixthat“encodes”dataCollaborativeFiltering10/30/202330EdChangGivenamatrixthat“encodes”dataManyapplications(collaborativefiltering):

User

CommunityUser–UserAds–UserAds–Communityetc.UsersCommunities10/30/202331EdChangCollaborativeFiltering(CF)

[Breese,HeckermanandKadie1998]Memory-basedGivenuseru,find“similar”users(knearestneighbors)Boughtsimilaritems,sawsimilarmovies,similarprofiles,etc.DifferentsimilaritymeasuresyielddifferenttechniquesMakepredictionsbasedonthepreferencesofthese“similar”usersModel-basedBuildamodelofrelationshipbetweensubjectmattersMakepredictionsbasedontheconstructedmodel10/30/202332EdChangMemory-BasedModel

[Goldbertetal.1992;Resniketal.1994;Konstantetal.1997]

ProsSimplicity,avoidmodel-buildingstageConsMemoryandTimeconsuming,usestheentiredatabaseeverytimetomakeapredictionCannotmakepredictioniftheuserhasnoitemsincommonwithotherusers10/30/202333EdChangModel-BasedModel

[Breeseetal.1998;Hoffman1999;Bleietal.2004]

ProsScalability,modelismuchsmallerthantheactualdatasetFasterprediction,querythemodelinsteadoftheentiredatasetConsModel-buildingtakestime10/30/202334EdChangAlgorithmSelectionCriteriaNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirableCandealwithdatascarcityCloudComputing!10/30/202335EdChangModel-basedPriorWorkLatentSemanticAnalysis(LSA)ProbabilisticLSA(PLSA)LatentDirichletAllocation(LDA)10/30/202336EdChangLatentSemanticAnalysis(LSA)

[Deerwesteretal.1990]Maphigh-dimensionalcountvectorstolowerdimensionalrepresentationcalledlatentsemanticspaceBySVDdecomposition:A=U∑VTA=Word-documentco-occurrencematrixUij

=Howlikelywordibelongstotopicj∑jj

=HowsignificanttopicjisVijT=HowlikelytopicibelongstodocjWordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxTTxTTxD10/30/202337EdChangLatentSemanticAnalysis(cont.)LSAkeepsk-largestsingularvaluesLow-rankapproximationtotheoriginalmatrixSavespace,de-noisifiedandreducesparsityMakerecommendationsusing?Word-wordsimilarity:??TDoc-docsimilarity:?T

?Word-docrelationship:?WordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxKKxKKxD????10/30/202338EdChangProbabilisticLatentSemanticAnalysis(PLSA)[Hoffman1999;Hoffman2004]DocumentisviewedasabagofwordsAlatentsemanticlayerisconstructedinbetweendocumentsandwordsP(w,d)=P(d)P(w|d)=P(d)∑zP(w|z)P(z|d)ProbabilitydeliversexplicitmeaningP(w|w),P(d|d),P(d,w)ModellearningviaEMalgorithmP(d)dwzP(z|d)P(w|z)10/30/202339EdChangPLSAextensionsPHITS[Cohn&Chang2000]Modeldocument-citationco-occurrenceAlinearcombinationofPLSAandPHITS[Cohn&Hoffmann2001]Modelcontents(words)andinter-connectivityofdocumentsLDA[Bleietal.2003]ProvideacompletegenerativemodelwithDirichletpriorAT[Griffiths&Steyvers2004]IncludeauthorshipinformationDocumentiscategorizedbyauthorsandtopicsART[McCallum2004]IncludeemailrecipientasadditionalinformationEmailiscategorizedbyauthor,recipientsandtopics10/30/202340EdChangCombinationalCollaborativeFiltering(CCF)FusemultipleinformationAlleviatetheinformationsparsityproblemHybridtrainingschemeGibbssamplingasinitializationsforEMalgorithmParallelizationAchievelinearspeedupwiththenumberofmachines10/30/202341EdChangNotationsGivenacollectionofco-occurrencedataCommunity:C={c1,c2,…,cN}User:U={u1,u2,…,uM}Description:D={d1,d2,…,dV}Latentaspect:Z={z1,z2,…,zK}ModelsBaselinemodelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCCF:CombinationalCollaborativeFilteringCombinesbothbaselinemodels10/30/202342EdChangBaselineModelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCommunityisviewedasabagofusers

canduarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachuseru1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AuseruisgeneratedfromP(u|z)Communityisviewedasabagofwords

canddarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachwordd1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AworddisgeneratedfromP(d|z)10/30/202343EdChangBaselineModels(cont.)Community-User(C-U)modelCommunity-Description(C-D)model

Pros1.Personalizedcommunitysuggestion

Cons

1.C-Umatrixissparse,maysufferfrom

informationsparsityproblem2.Cannottakeadvantageofcontent

similaritybetweencommunities

Pros1.Clustercommunitiesbasedoncommunitycontent(descriptionwords)

Cons

1.Nopersonalizedrecommendation2.Donotconsidertheoverlappedusersbetweencommunities10/30/202344EdChangCCFModelCombinationalCollaborativeFiltering(CCF)modelCCFcombinesbothbaselinemodelsAcommunityisviewedas

-abagofusersANDabagofwordsByaddingC-U,CCFcanperformpersonalizedrecommendationwhichC-Dalonecannot

ByaddingC-D,CCFcanperformbetterpersonalizedrecommendationthanC-UalonewhichmaysufferfromsparsityThingsCCFcandothatC-UandC-Dcannot-P(d|u),relateusertoword-Usefulforusertargetingads10/30/202345EdChangAlgorithmRequirementsNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirable10/30/202346EdChangParallelizingCCFDetailsomitted10/30/202347EdChangpicturesource:http://

(1)數(shù)據(jù)在云端不怕丟失不必備份(2)軟件在云端不必下載自動(dòng)升級(jí)(3)無(wú)所不在的云計(jì)算任何設(shè)備登錄后就是你的(4)無(wú)限強(qiáng)大的云計(jì)算無(wú)限空間無(wú)限速度業(yè)界趨勢(shì):云計(jì)算時(shí)代的到來(lái)10/30/202348EdChangExperimentsonOrkutDatasetDatadescriptionCollectedonJuly26,2007TwotypesofdatawereextractedCommunity-user,community-description312,385users109,987communities191,034uniqueEnglishwordsCommunityrecommendationCommunitysimilarity/clusteringUsersimilaritySpeedup10/30/202349EdChangCommunityRecommendationEvaluationMethodNoground-truth,nouserclicksavailableLeave-one-out:randomlydeleteonecommunityforeachuserWhetherthedeletedcommunitycanberecoveredEvaluationmetricPrecisionandRecall10/30/202350EdChangResultsObservations:

CCFoutperformsC-UFortop20,precision/recallofCCF

aretwicehigherthanthoseofC-U

Themorecommunitiesauserhasjoined,thebetterCCF/C-Ucanpredict10/30/202351EdChangRuntimeSpeedupTheOrkutdatasetenjoysalinearspeedupwhenthenumberofmachinesisupto100Reducesthetrainingtimefromonedaytolessthan14minutesBut,whatmakesthespeedupslowdownafter100machines?10/30/202352EdChangRuntimeSpeedup(cont.)Trainingtimeconsistsoftwoparts:Computationtime(Comp)Communicationtime(Comm)10/30/202353EdChangCCFSummaryCombinationalCollaborativeFilteringFusebagsofwordsandbagsofusersinformationHybridtrainingprovidesbetterinitializationsforEMratherthanrandomseedingParallelizetohandlelarge-scaledatasets10/30/202354EdChangChina’sContributionson/to

CloudComputingParallelCCFParallelSVMs(KernelMachines)ParallelSVDParallelSpectralClusteringParallelExpectationMaximizationParallelAssociationMiningParallelLDA

10/30/202355EdChangSpeedingupSVMs

[NIPS2007]ApproximateMatrixFactorizationParallelizationOpensource@/p/psvm350+downloadssinceDecember07Ataskthattakes7dayson1

machinetakes1hourson500machines10/30/202356EdChangIncompleteCholeskyFactorization(ICF)nxnnxppxnp<<nConserveStorage10/30/202357EdChangMatrixProduct=pxnnxppxp10/30/202358EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/202359EdChangWebWithPeople.htm.htm.htm.jpg.jpg.doc.xls.msg.msg.htm10/30/202360EdChangWhatNextforWebSearch?

PersonalizationReturnqueryresultsconsideringpersonalpreferencesExample:DisambiguatesynonymlikefujiOops:severaltried,theproblemishardTrainingdatadifficulttocollectenough(forcollaborativefiltering)Computationalintensivetosupportpersonalization(e.g.,forpersonalizingpagerank)Userprofilemaybeincomplete,erroneous10/30/202361EdChang個(gè)人搜索智能搜索搜索“富士”可返回富士山富士蘋果富士相機(jī)10/30/202362EdChang10/30/202363EdChang10/30/202364EdChang10/30/202365EdChang10/30/202366EdChangOrganizingWorld’sInformation,SociallyWebisaCollectionofDocumentsandPeopleRecommendationisaPersonalized,PushModelofSearchCollaborativeFilteringRequiresDenseInformationtobeEffectiveCloudComputingisEssential10/30/202367EdChangReferences[1]Alexainternet.http:///.[2]D.M.BleiandM.I.Jordan.Variationalmethodsforthe

dirichletprocess.InProc.ofthe21stinternational

conferenceonMachinelearning,pages373-380,2004.[3]D.M.Blei,A.Y.Ng,andM.I.Jordan.Latentdirichlet

allocation.JournalofMachineLearningResearch,

3:993-1022,2003.[4]D.CohnandH.Chang.Learningtoprobabilisticallyidentifyauthoritativedocuments.InProc.oftheSeventeenthInternationalConferenceonMachineLearning,pages167-174,2000.[5]D.CohnandT.Hofmann.Themissinglink-aprobabilisticmodelofdocumentcontentandhypertextconnectivity.InAdvancesinNeuralInformationProcessingSystems13,pages430-436,2001.[6]S.C.Deerwester,S.T.Dumais,T.K.Landauer,G.W.Furnas,andR.A.Harshman.Indexingbylatentsemanticanalysis.JournaloftheAmericanSocietyofInformationScience,41(6):391-407,1990.[7]A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximumlikelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),39(1):1-38,1977.[8]S.GemanandD.Geman.Stochasticrelaxation,gibbsdistributions,andthebayesianrestorationofimages.IEEETransactionsonPatternrecognitionandMachineIntelligence,6:721-741,1984.[9]T.Hofmann.Probabilisticlatentsemanticindexing.InProc.ofUncertaintyinArti

cialIntelligence,pages289-296,1999.[10]T.Hofmann.Latentsemanticmodelsforcollaborativefiltering.ACMTransactionsonInformationSystem,22(1):89-115,2004.[11]A.McCallum,A.Corrada-Emmanuel,andX.Wang.Theauthor-recipient-topicmodelfortopicandrolediscoveryinsocialnetworks:Experimentswithenronandacademicemail.Technicalreport,ComputerScience,UniversityofMassachusettsAmherst,2004.[12]D.Newman,A.Asuncion,P.Smyth,andM.Welling.Distributedinferenceforlatentdirichletallocation.InAdvancesinNeuralInformationProcessingSystems20,2007.[13]M.Ramoni,P.Sebastiani,andP.Cohen.Bayesianclusteringbydynamics.MachineLearning,47(1):91-121,2002.10/30/202368EdChangReferences(cont.)[14]R.Salakhutdinov,A.Mnih,andG.Hinton.Restrictedboltzmannmachinesforcollaborative

ltering.InProc.Ofthe24thinternationalconferenceonMachinelearning,pages791-798,2007.[15]E.Spertus,M.Sahami,andO.Buyukkokten.Evaluatingsimilaritymeasures:alarge-scalestudyintheorkutsocialnetwork.InProc.ofthe11thACMSIGKDDinternationalconferenceonKnowledgediscoveryindatamining,pages678-684,2005.[16]M.Steyvers,P.Smyth,M.Rosen-Zvi,andT.Gri

ths.Probabilisticauthor-topicmodelsforinformationdiscovery.InProc.ofthe10thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages306-315,2004.[17]A.StrehlandJ.Ghosh.Clusterensembles-aknowledgereuseframeworkforcombiningmultiplepartitions.JournalonMachineLearningResearch(JMLR),3:583-617,2002.[18]T.ZhangandV.S.Iyengar.Recommendersystemsusinglinearclassi

ers.JournalofMachineLearningResearch,2:313-334,2002.[19]S.ZhongandJ.Ghosh.Generativemodel-basedclusteringofdocuments:acomparativestudy.KnowledgeandInformationSystems(KAIS),8:374-384,2005.[20]L.AdmicandE.Adar.Howtosearchasocialnetwork.2004[21]T.L.GriffithsandM.Steyvers.Findingscientifictopics.ProceedingsoftheNationalAcademyofSciences,pages5228-5235,2004.[22]H.Kautz,B.Selman,andM.Shah.ReferralWeb:Combiningsocialnetworksandcollaborativefiltering.CommunitcationsoftheACM,3:63-65,1997.[23]R.Agrawal,T.Imielnski,A.Swami.Miningassociationrulesbetweensetsofitemsinlargedatabses.SIGMODRec.,22:207-116,1993.[24]J.S.Breese,D.Heckerman,andC.Kadie.Empiricalanalysisofpredictivealgorithmsforcollaborativefiltering.InProceedingsoftheFourteenthConferenceonUncertaintyinArtificalIntelligence,1998.[25]M.DeshpandeandG.Karypis.Item-basedtop-nrecommendationalgorithms.ACMTrans.Inf.Syst.,22(1):143-177,2004.10/30/202369EdChangReferences(cont.)[26]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Reidl.Item-basedcollaborativefilteringrecommendationalgorithms.InProceedingsofthe10thInternationalWorldWideWebConference,pages285-295,2001.[27]

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論