




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
云計(jì)算時(shí)代的社交網(wǎng)絡(luò)
平臺(tái)和技術(shù)
張智威副院長(zhǎng),研究院,谷歌中國(guó)教授,電機(jī)工程系,加州大學(xué)10/30/20231EdChang180million(↑25%)208million(↑
3%)60million(↑
90%)60million(↑
29%)500million180million600kEngineeringGraduatesMobilePhonesBroadbandUsersInternetPopulationChinaU.S.ChinaOpportunity
China&USin2006-0772k7200010/30/20232EdChangGoogleChinaSize(~700)200engineers400otheremployeesAlmost100internsLocationsBeijing(2005)Taipei(2006)Shanghai(2007)10/30/20233EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/20234EdChangWeb1.0.htm.htm.htm.jpg.jpg.doc.htm.msg.htm.htm10/30/20235EdChangWebwithPeople(2.0).htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htm10/30/20236EdChang+SocialPlatforms.htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htmApp(Gadget)App(Gadget)10/30/20237EdChang10/30/20238EdChang10/30/20239EdChang10/30/202310EdChang10/30/202311EdChang開(kāi)放社區(qū)平臺(tái)10/30/202312EdChang10/30/202313EdChang10/30/202314EdChang10/30/202315EdChang10/30/202316EdChang開(kāi)放社區(qū)平臺(tái)社區(qū)平臺(tái)1我是誰(shuí)2我的朋友3他的活動(dòng)10/30/202317EdChang10/30/202318EdChang10/30/202319EdChang開(kāi)放社區(qū)平臺(tái)社區(qū)平臺(tái)1我是誰(shuí)2我的朋友3他的活動(dòng)4他的東西10/30/202320EdChang10/30/202321EdChangSocialGraph10/30/202322EdChang10/30/202323EdChangWhatUsersWant?PeoplecareaboutotherpeoplecareaboutpeopletheyknowconnecttopeopletheydonotknowDiscoverinterestinginformationbasedonotherpeopleaboutwhootherpeopleareaboutwhatotherpeoplearedoing10/30/202324EdChangInformationOverflowChallengeToomanypeople,toomanychoicesofforumsandapps“Isoonneedtohireafull-timetomanagemyonlinesocialnetworks”DesiringaSocialNetworkRecommendationSystem10/30/202325EdChangRecommendationSystemFriendRecommendationCommunity/ForumRecommendationApplicationSuggestionAdsMatching10/30/202326EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/202327EdChangpicturesource:http://
(1)數(shù)據(jù)在云端不怕丟失不必備份(2)軟件在云端不必下載自動(dòng)升級(jí)(3)無(wú)所不在的云計(jì)算任何設(shè)備登錄后就是你的(4)無(wú)限強(qiáng)大的云計(jì)算無(wú)限空間無(wú)限速度業(yè)界趨勢(shì):云計(jì)算時(shí)代的到來(lái)10/30/202328EdChang互聯(lián)網(wǎng)搜索:云計(jì)算的例子1.用戶輸入查詢關(guān)鍵字CloudComputing2.分布式預(yù)處理數(shù)據(jù)以便為搜索提供服務(wù):GoogleInfrastructure(thousandsofcommodityserversaroundtheworld)MapReduceformassdataprocessingGoogleFileSystem3.返回搜索結(jié)果10/30/202329EdChangGivenamatrixthat“encodes”dataCollaborativeFiltering10/30/202330EdChangGivenamatrixthat“encodes”dataManyapplications(collaborativefiltering):
User
–
CommunityUser–UserAds–UserAds–Communityetc.UsersCommunities10/30/202331EdChangCollaborativeFiltering(CF)
[Breese,HeckermanandKadie1998]Memory-basedGivenuseru,find“similar”users(knearestneighbors)Boughtsimilaritems,sawsimilarmovies,similarprofiles,etc.DifferentsimilaritymeasuresyielddifferenttechniquesMakepredictionsbasedonthepreferencesofthese“similar”usersModel-basedBuildamodelofrelationshipbetweensubjectmattersMakepredictionsbasedontheconstructedmodel10/30/202332EdChangMemory-BasedModel
[Goldbertetal.1992;Resniketal.1994;Konstantetal.1997]
ProsSimplicity,avoidmodel-buildingstageConsMemoryandTimeconsuming,usestheentiredatabaseeverytimetomakeapredictionCannotmakepredictioniftheuserhasnoitemsincommonwithotherusers10/30/202333EdChangModel-BasedModel
[Breeseetal.1998;Hoffman1999;Bleietal.2004]
ProsScalability,modelismuchsmallerthantheactualdatasetFasterprediction,querythemodelinsteadoftheentiredatasetConsModel-buildingtakestime10/30/202334EdChangAlgorithmSelectionCriteriaNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirableCandealwithdatascarcityCloudComputing!10/30/202335EdChangModel-basedPriorWorkLatentSemanticAnalysis(LSA)ProbabilisticLSA(PLSA)LatentDirichletAllocation(LDA)10/30/202336EdChangLatentSemanticAnalysis(LSA)
[Deerwesteretal.1990]Maphigh-dimensionalcountvectorstolowerdimensionalrepresentationcalledlatentsemanticspaceBySVDdecomposition:A=U∑VTA=Word-documentco-occurrencematrixUij
=Howlikelywordibelongstotopicj∑jj
=HowsignificanttopicjisVijT=HowlikelytopicibelongstodocjWordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxTTxTTxD10/30/202337EdChangLatentSemanticAnalysis(cont.)LSAkeepsk-largestsingularvaluesLow-rankapproximationtotheoriginalmatrixSavespace,de-noisifiedandreducesparsityMakerecommendationsusing?Word-wordsimilarity:??TDoc-docsimilarity:?T
?Word-docrelationship:?WordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxKKxKKxD????10/30/202338EdChangProbabilisticLatentSemanticAnalysis(PLSA)[Hoffman1999;Hoffman2004]DocumentisviewedasabagofwordsAlatentsemanticlayerisconstructedinbetweendocumentsandwordsP(w,d)=P(d)P(w|d)=P(d)∑zP(w|z)P(z|d)ProbabilitydeliversexplicitmeaningP(w|w),P(d|d),P(d,w)ModellearningviaEMalgorithmP(d)dwzP(z|d)P(w|z)10/30/202339EdChangPLSAextensionsPHITS[Cohn&Chang2000]Modeldocument-citationco-occurrenceAlinearcombinationofPLSAandPHITS[Cohn&Hoffmann2001]Modelcontents(words)andinter-connectivityofdocumentsLDA[Bleietal.2003]ProvideacompletegenerativemodelwithDirichletpriorAT[Griffiths&Steyvers2004]IncludeauthorshipinformationDocumentiscategorizedbyauthorsandtopicsART[McCallum2004]IncludeemailrecipientasadditionalinformationEmailiscategorizedbyauthor,recipientsandtopics10/30/202340EdChangCombinationalCollaborativeFiltering(CCF)FusemultipleinformationAlleviatetheinformationsparsityproblemHybridtrainingschemeGibbssamplingasinitializationsforEMalgorithmParallelizationAchievelinearspeedupwiththenumberofmachines10/30/202341EdChangNotationsGivenacollectionofco-occurrencedataCommunity:C={c1,c2,…,cN}User:U={u1,u2,…,uM}Description:D={d1,d2,…,dV}Latentaspect:Z={z1,z2,…,zK}ModelsBaselinemodelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCCF:CombinationalCollaborativeFilteringCombinesbothbaselinemodels10/30/202342EdChangBaselineModelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCommunityisviewedasabagofusers
canduarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachuseru1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AuseruisgeneratedfromP(u|z)Communityisviewedasabagofwords
canddarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachwordd1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AworddisgeneratedfromP(d|z)10/30/202343EdChangBaselineModels(cont.)Community-User(C-U)modelCommunity-Description(C-D)model
Pros1.Personalizedcommunitysuggestion
Cons
1.C-Umatrixissparse,maysufferfrom
informationsparsityproblem2.Cannottakeadvantageofcontent
similaritybetweencommunities
Pros1.Clustercommunitiesbasedoncommunitycontent(descriptionwords)
Cons
1.Nopersonalizedrecommendation2.Donotconsidertheoverlappedusersbetweencommunities10/30/202344EdChangCCFModelCombinationalCollaborativeFiltering(CCF)modelCCFcombinesbothbaselinemodelsAcommunityisviewedas
-abagofusersANDabagofwordsByaddingC-U,CCFcanperformpersonalizedrecommendationwhichC-Dalonecannot
ByaddingC-D,CCFcanperformbetterpersonalizedrecommendationthanC-UalonewhichmaysufferfromsparsityThingsCCFcandothatC-UandC-Dcannot-P(d|u),relateusertoword-Usefulforusertargetingads10/30/202345EdChangAlgorithmRequirementsNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirable10/30/202346EdChangParallelizingCCFDetailsomitted10/30/202347EdChangpicturesource:http://
(1)數(shù)據(jù)在云端不怕丟失不必備份(2)軟件在云端不必下載自動(dòng)升級(jí)(3)無(wú)所不在的云計(jì)算任何設(shè)備登錄后就是你的(4)無(wú)限強(qiáng)大的云計(jì)算無(wú)限空間無(wú)限速度業(yè)界趨勢(shì):云計(jì)算時(shí)代的到來(lái)10/30/202348EdChangExperimentsonOrkutDatasetDatadescriptionCollectedonJuly26,2007TwotypesofdatawereextractedCommunity-user,community-description312,385users109,987communities191,034uniqueEnglishwordsCommunityrecommendationCommunitysimilarity/clusteringUsersimilaritySpeedup10/30/202349EdChangCommunityRecommendationEvaluationMethodNoground-truth,nouserclicksavailableLeave-one-out:randomlydeleteonecommunityforeachuserWhetherthedeletedcommunitycanberecoveredEvaluationmetricPrecisionandRecall10/30/202350EdChangResultsObservations:
CCFoutperformsC-UFortop20,precision/recallofCCF
aretwicehigherthanthoseofC-U
Themorecommunitiesauserhasjoined,thebetterCCF/C-Ucanpredict10/30/202351EdChangRuntimeSpeedupTheOrkutdatasetenjoysalinearspeedupwhenthenumberofmachinesisupto100Reducesthetrainingtimefromonedaytolessthan14minutesBut,whatmakesthespeedupslowdownafter100machines?10/30/202352EdChangRuntimeSpeedup(cont.)Trainingtimeconsistsoftwoparts:Computationtime(Comp)Communicationtime(Comm)10/30/202353EdChangCCFSummaryCombinationalCollaborativeFilteringFusebagsofwordsandbagsofusersinformationHybridtrainingprovidesbetterinitializationsforEMratherthanrandomseedingParallelizetohandlelarge-scaledatasets10/30/202354EdChangChina’sContributionson/to
CloudComputingParallelCCFParallelSVMs(KernelMachines)ParallelSVDParallelSpectralClusteringParallelExpectationMaximizationParallelAssociationMiningParallelLDA
10/30/202355EdChangSpeedingupSVMs
[NIPS2007]ApproximateMatrixFactorizationParallelizationOpensource@/p/psvm350+downloadssinceDecember07Ataskthattakes7dayson1
machinetakes1hourson500machines10/30/202356EdChangIncompleteCholeskyFactorization(ICF)nxnnxppxnp<<nConserveStorage10/30/202357EdChangMatrixProduct=pxnnxppxp10/30/202358EdChangOrganizingtheWorld’sInformation,Socially社區(qū)平臺(tái)(SocialPlatform)云運(yùn)算(CloudComputing)結(jié)論與前瞻(ConcludingRemarks)10/30/202359EdChangWebWithPeople.htm.htm.htm.jpg.jpg.doc.xls.msg.msg.htm10/30/202360EdChangWhatNextforWebSearch?
PersonalizationReturnqueryresultsconsideringpersonalpreferencesExample:DisambiguatesynonymlikefujiOops:severaltried,theproblemishardTrainingdatadifficulttocollectenough(forcollaborativefiltering)Computationalintensivetosupportpersonalization(e.g.,forpersonalizingpagerank)Userprofilemaybeincomplete,erroneous10/30/202361EdChang個(gè)人搜索智能搜索搜索“富士”可返回富士山富士蘋果富士相機(jī)10/30/202362EdChang10/30/202363EdChang10/30/202364EdChang10/30/202365EdChang10/30/202366EdChangOrganizingWorld’sInformation,SociallyWebisaCollectionofDocumentsandPeopleRecommendationisaPersonalized,PushModelofSearchCollaborativeFilteringRequiresDenseInformationtobeEffectiveCloudComputingisEssential10/30/202367EdChangReferences[1]Alexainternet.http:///.[2]D.M.BleiandM.I.Jordan.Variationalmethodsforthe
dirichletprocess.InProc.ofthe21stinternational
conferenceonMachinelearning,pages373-380,2004.[3]D.M.Blei,A.Y.Ng,andM.I.Jordan.Latentdirichlet
allocation.JournalofMachineLearningResearch,
3:993-1022,2003.[4]D.CohnandH.Chang.Learningtoprobabilisticallyidentifyauthoritativedocuments.InProc.oftheSeventeenthInternationalConferenceonMachineLearning,pages167-174,2000.[5]D.CohnandT.Hofmann.Themissinglink-aprobabilisticmodelofdocumentcontentandhypertextconnectivity.InAdvancesinNeuralInformationProcessingSystems13,pages430-436,2001.[6]S.C.Deerwester,S.T.Dumais,T.K.Landauer,G.W.Furnas,andR.A.Harshman.Indexingbylatentsemanticanalysis.JournaloftheAmericanSocietyofInformationScience,41(6):391-407,1990.[7]A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximumlikelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),39(1):1-38,1977.[8]S.GemanandD.Geman.Stochasticrelaxation,gibbsdistributions,andthebayesianrestorationofimages.IEEETransactionsonPatternrecognitionandMachineIntelligence,6:721-741,1984.[9]T.Hofmann.Probabilisticlatentsemanticindexing.InProc.ofUncertaintyinArti
cialIntelligence,pages289-296,1999.[10]T.Hofmann.Latentsemanticmodelsforcollaborativefiltering.ACMTransactionsonInformationSystem,22(1):89-115,2004.[11]A.McCallum,A.Corrada-Emmanuel,andX.Wang.Theauthor-recipient-topicmodelfortopicandrolediscoveryinsocialnetworks:Experimentswithenronandacademicemail.Technicalreport,ComputerScience,UniversityofMassachusettsAmherst,2004.[12]D.Newman,A.Asuncion,P.Smyth,andM.Welling.Distributedinferenceforlatentdirichletallocation.InAdvancesinNeuralInformationProcessingSystems20,2007.[13]M.Ramoni,P.Sebastiani,andP.Cohen.Bayesianclusteringbydynamics.MachineLearning,47(1):91-121,2002.10/30/202368EdChangReferences(cont.)[14]R.Salakhutdinov,A.Mnih,andG.Hinton.Restrictedboltzmannmachinesforcollaborative
ltering.InProc.Ofthe24thinternationalconferenceonMachinelearning,pages791-798,2007.[15]E.Spertus,M.Sahami,andO.Buyukkokten.Evaluatingsimilaritymeasures:alarge-scalestudyintheorkutsocialnetwork.InProc.ofthe11thACMSIGKDDinternationalconferenceonKnowledgediscoveryindatamining,pages678-684,2005.[16]M.Steyvers,P.Smyth,M.Rosen-Zvi,andT.Gri
ths.Probabilisticauthor-topicmodelsforinformationdiscovery.InProc.ofthe10thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages306-315,2004.[17]A.StrehlandJ.Ghosh.Clusterensembles-aknowledgereuseframeworkforcombiningmultiplepartitions.JournalonMachineLearningResearch(JMLR),3:583-617,2002.[18]T.ZhangandV.S.Iyengar.Recommendersystemsusinglinearclassi
ers.JournalofMachineLearningResearch,2:313-334,2002.[19]S.ZhongandJ.Ghosh.Generativemodel-basedclusteringofdocuments:acomparativestudy.KnowledgeandInformationSystems(KAIS),8:374-384,2005.[20]L.AdmicandE.Adar.Howtosearchasocialnetwork.2004[21]T.L.GriffithsandM.Steyvers.Findingscientifictopics.ProceedingsoftheNationalAcademyofSciences,pages5228-5235,2004.[22]H.Kautz,B.Selman,andM.Shah.ReferralWeb:Combiningsocialnetworksandcollaborativefiltering.CommunitcationsoftheACM,3:63-65,1997.[23]R.Agrawal,T.Imielnski,A.Swami.Miningassociationrulesbetweensetsofitemsinlargedatabses.SIGMODRec.,22:207-116,1993.[24]J.S.Breese,D.Heckerman,andC.Kadie.Empiricalanalysisofpredictivealgorithmsforcollaborativefiltering.InProceedingsoftheFourteenthConferenceonUncertaintyinArtificalIntelligence,1998.[25]M.DeshpandeandG.Karypis.Item-basedtop-nrecommendationalgorithms.ACMTrans.Inf.Syst.,22(1):143-177,2004.10/30/202369EdChangReferences(cont.)[26]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Reidl.Item-basedcollaborativefilteringrecommendationalgorithms.InProceedingsofthe10thInternationalWorldWideWebConference,pages285-295,2001.[27]
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度醫(yī)護(hù)就業(yè)合同模板(醫(yī)療安全風(fēng)險(xiǎn)管理)
- 二零二五年度國(guó)際論壇會(huì)務(wù)接待與翻譯服務(wù)合同
- 2025年度糧食倉(cāng)儲(chǔ)物流保險(xiǎn)合同
- 2025年醫(yī)用植入材料合作協(xié)議書
- 2025年度臨時(shí)用工勞務(wù)服務(wù)合同
- 2025年空氣煙氣監(jiān)測(cè)系統(tǒng)項(xiàng)目合作計(jì)劃書
- 2025年汽車尾氣自動(dòng)測(cè)定儀合作協(xié)議書
- 2025年高性能氣敏傳感器項(xiàng)目建議書
- 矯形外科手術(shù)器械項(xiàng)目績(jī)效評(píng)估報(bào)告
- 機(jī)場(chǎng)信息網(wǎng)絡(luò)系統(tǒng)項(xiàng)目績(jī)效評(píng)估報(bào)告
- 中國(guó)古典文獻(xiàn)-第七章-文獻(xiàn)目錄
- 學(xué)前教育大專畢業(yè)論文3000字
- 注塑領(lǐng)班簡(jiǎn)歷樣板
- 骨骼肌-人體解剖學(xué)-運(yùn)動(dòng)系統(tǒng)
- 基于康耐視相機(jī)的視覺(jué)識(shí)別實(shí)驗(yàn)指導(dǎo)書
- 三年級(jí)書法下冊(cè)《第9課 斜鉤和臥鉤》教學(xué)設(shè)計(jì)
- 兒童財(cái)商養(yǎng)成教育講座PPT
- 大學(xué)學(xué)院學(xué)生獎(jiǎng)助資金及相關(guān)經(jīng)費(fèi)發(fā)放管理暫行辦法
- 2022蘇教版科學(xué)五年級(jí)下冊(cè)全冊(cè)優(yōu)質(zhì)教案教學(xué)設(shè)計(jì)
- 2023年R2移動(dòng)式壓力容器充裝操作證考試題及答案(完整版)
- 九年級(jí)物理實(shí)驗(yàn)記錄單
評(píng)論
0/150
提交評(píng)論