全國知識(shí)圖譜與語義計(jì)算大會(huì)ccks2016工業(yè)界論壇-t4wei_第1頁
全國知識(shí)圖譜與語義計(jì)算大會(huì)ccks2016工業(yè)界論壇-t4wei_第2頁
全國知識(shí)圖譜與語義計(jì)算大會(huì)ccks2016工業(yè)界論壇-t4wei_第3頁
全國知識(shí)圖譜與語義計(jì)算大會(huì)ccks2016工業(yè)界論壇-t4wei_第4頁
全國知識(shí)圖譜與語義計(jì)算大會(huì)ccks2016工業(yè)界論壇-t4wei_第5頁
已閱讀5頁,還剩63頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

知識(shí)圖譜的集成 計(jì)算機(jī)科學(xué)與軟件新技術(shù)國CCKS2016講習(xí)班,提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata 2Semantic SemanticWebwasathoughtfromTimBerners- GiveformalmeaningstoWebinformation– Web1.0(page)àWeb2.0(social)àWeb3.0(awebof SemanticWebiscommonformats integrationandcombinationofdrawnfromdiverselanguages recordinghowthedatarelatestoreal-worldobjects3 RDF

謂主 賓

LayerTheworldisnotmadeofstrings,butismadeofthings4Linkeddata 數(shù)據(jù)/關(guān)聯(lián)數(shù)據(jù) AsarealizationofSemantic LinkedDatareferstoacollectionofinterrelated Usedforlarge-scaleintegrationof,reasoningon,dataonthe LinkeddataUseURIstonameUseHTTPURIs(canbeProvideusefulinformationusingopenWebstandards(e.g.Includelinkstootherrelated5Linkedopendata(LOD)1,000+

lifesocial

6Knowledge KnowledgeGraphisaknowledgebaseusedby toenhanceitssearchengine’ssearchresultswithsemantic-searchinformationgatheredfromawidevarietyofsources?知識(shí)圖譜是使用的一個(gè)知識(shí)庫, 亦可看作是一張巨大的圖,節(jié)點(diǎn)表實(shí)體或概念,邊則由屬性或關(guān)系 除了關(guān) (部分)真實(shí)世界的一個(gè)模 引入領(lǐng)域相關(guān)的 指定術(shù)語的含義(語義 使用合適的邏輯來形 描述 HeartisamuscularorganispartofthecirculatoryI.Horrocks.Ontologiesandthesemanticweb:thestorysofar. 大規(guī)模知識(shí)庫/圖譜規(guī)英文:4百萬個(gè)實(shí)體,5億個(gè)RDF三元125種1千萬個(gè)實(shí)體,1.2億個(gè)RDF三元4千萬個(gè)實(shí)體,10億個(gè)RDF三元 知識(shí)圖譜6億個(gè)實(shí)體,35億條RDF三元WolframAlpha計(jì)算知識(shí)引擎,CMUNELL,知心,搜狗知立9知識(shí)圖譜的技術(shù)族知識(shí)體已有知識(shí) 知識(shí)圖譜提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Sincelonglongtimes SyntacticSchema- e.g.,“WeiHu”vs.Schema- Terminological e.g.,“notebook”vs.Data-entityData-entity Pragmatic OntheSemantic Datahasexplicitsemantics,richlinks,Ontology Thepopularityofontologiesisrapidlygrowing,andthenumberofontologiescontinuesincreasing Ontology Theprocessofdeterminingcorrespondencesbetween 本體匹配即發(fā)現(xiàn)一個(gè)三元組????????>,包括一個(gè)源本體??,一個(gè)目標(biāo)本體??’,以及一個(gè)映射單元的集合??={??1??2????}。其中,????表示一個(gè)基本的映射元,可以寫成????=<????,??????>的四元 ????為映射單元的標(biāo)識(shí)符,用于唯一標(biāo)識(shí)該四元 ??,??’分別為??,??’中的術(shù) ??表示????’之間的相似度,滿足??//另外,可以有??表示??,??’之間的關(guān)系,常見的關(guān)系有等本體匹配:消除模式 (驅(qū)動(dòng)的)Stateofthe語言學(xué)特征 本體中術(shù)語的語言學(xué)描 本地名(localnForanameNinanamespaceidentifiedbyaURII,thenamespacenameisI.ForanameNthatisnotinanamespace,thenamespacenamehasnovalue.Definition:IneithercasethelocalnameisN.n -->local 注釋 其他:foaf:name、dc:title語言學(xué)特征 本體語言學(xué)特征使用現(xiàn)狀的調(diào) 本地名使用多,有一些 注 鄰居未充 詞典查詢耗√√√√√√類√√機(jī)器學(xué)√排序、S-類√ Edit 指兩個(gè)字串之間,由一個(gè)轉(zhuǎn)成另一個(gè)所需的最少編輯操作次 編輯操作包括替換、插入、刪 一般來說,編輯距離越小,兩個(gè)字串的相似度越 I-Sub:??????(??1,??2)=????????(??1,??2)?????????(??1,??2)+??????????????(??1, biggestcommonsubstringtwo thelengthofunmatchedresultedfrominitialmatching 術(shù)語的語言學(xué)描述:本地名 、注 結(jié)點(diǎn)的語言學(xué)描述:前向鄰居的語言學(xué)描 術(shù)語的鄰居:主語鄰居、謂語鄰居、賓語鄰 術(shù)語的虛擬文檔:自身+???????? =???????? +??3???????? +??1 向量空間模型:TF-Stringsimilaritymetrics Lessthantwowordsperlabel:Jaro- Twoormorewordsper Synonyms:SoftJaccard,withLevensteinbase Nosynonyms:SoftJaccard,withLevensteinbase Lessthantwowordsperlabel:TF- Twoormorewordsper Synonyms:SoftTF-IDF,withJaro-Winklerbase DifferentLanguages:SoftTF-IDF,withJaro-Winklerbase Other:SoftTF-IDF,withJaro-Winklerbase結(jié)構(gòu)特征 Intuition:termsoftwodistinctontologiesaresimilarwhenadjacenttermsarennSimilarity?^_`??, =?^??, +

ij,k,lcl,k,ir

?^(??e,??e)g??(??e,??e,(??,?^(??q,??q)g??(??q,??q,(??,實(shí)例數(shù)據(jù) Machine Jointprobability Instance Content Name Meta Relaxation

搜索引擎 distance sbetween -basedsimilarity????????, =maxlog????,log?? ?log??(??,log???min{log????,log ?? isthenumber hitsforthesearchterm ?? isthenumber hitsforthesearchterm ????, isthenumber hitsforthetupleofsearchterms?? ??isthenumberofwebpagesindexed (??≈10`x)Ontologymatching Falcon- New Alotof(semi-)automaticalgorithmsand Mostareonlyapplicableforsmall ManyapplicationsrequirematchingBIG Medicineandbiology:GALEN,FMA, Agricultureandfood:AGROVOC, Librarycollections:Brinkman, Commonknowledge:DBpedia,

≥10K Adivide-and-conquer1.ontologypartitioningà2.blockmatchingà3.termRunningNewdirectionsnHolisticontologynIncreasingamountofdataàsimultaneouslymatchingnInput:asetΩ={??1,…,????}ofontologieswith??>2nOutput:??=??12∪??13∪??23∪?nGuaranteetofindalwaysthesameAglobaloptimal Limitationofpairwise ??isconsideredasalocalsolutiondependingoftheorderwhichtheontologymatchingiscarried e.g.??12∪??`}~≠??13∪??`~}≠??23∪??}~Holisticontology Extending um-weightedgraphmatchingproblemwithconstraints(cardinality,structuralandcoherence Threetypesof Class,objectproperty,data Representvirtualconnectionsbetweenthesametypesof Haveweightstorepresentsimilaritiesbetweenthe Correspondences(1:1)with umweight?à Linearconstraints:binary Classdecision disjoint 提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Entity SemanticWebdatahavereachedascaleinbillionsof Manydifferententitiesrefertothesamereal-world TypicallydenotedbyURIs,fromdistributeddata e.g.Wei? Entitylinkage:linkdifferententitiesthatrefertothesame a.k.a.coreferenceresolution,entitymatching,recordlinkage Theentitymatchingproblemwasoriginallydefinedin1959by beetal.andwasformalizedbyFelligiandSuntertenyearslater Outof31BRDFstatements,lessthan500Marelinksacross 實(shí) 的識(shí) 數(shù) 的消 消除描述這些標(biāo)識(shí)符RDF數(shù)據(jù)之StateStateofthe Stateofthe InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current owl:sameAs,inversefunctionalpropertiesSimilaritycomputation(alsointhedatabase ComparepropertiesandvaluesofEquivalence AnRDFtriple:???,??,???∈(??∪??)×??×(??∪??∪ Same-asrelation: ???,owl:sameAs,???à???,???∈??and???,???∈ Inversefunctionalproperty(IFP)relation: IFP:avaluecanonlybethevalueofthispropertyforasingle e.g.,??1,foaf:mbox,??,??2,foaf:mbox,??à???1,??2?∈??and???2,??1?∈ Functionalproperty(FP)relation: Cardinalityrelation: owl:cardinality/owl:maxCardinality= ??=??∪??∪??∪??+,??isanequivalenceSimilarity Similarity LinkSimilarity 問題一般為以下形??,?? ????????,??>??,??∈??,??∈?? ??和??是兩個(gè)字符串集合,??是相似度 時(shí)間復(fù)雜度為:??(??}??}) 現(xiàn)有的常規(guī)的方法是“過濾—驗(yàn)證”框 過濾階段:使用各種過濾方法縮小候選集大 常見方 All-Pairs,ED-Join,PPJoin,PassJoin Na?vepairwise:??}pairwise 1,000businesslistingseachfrom1,000differentcitiesacrossthe 1trillioncomparisons,11.6days(ifeachcomparisonis1 Mentionsfromdifferentcitiesareunlikelytobe Blockingcriterion: 1billioncomparisons,16minutes(ifeachcomparisonis1 Hashbased Pairwisesimilarity/neighborhoodbasedblocking Simpleblocking:invertedMachine Alinkage Learning Genetic ActiveSelectslinkcandidatestobelabeledbyaAhumanexpertlabelstheselectedlinkascorrectorincorrectThegeneticprogrammingalgorithmevolvesthepopulationoflinkagerules InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current Atpresent,probablymissmanypotentialSimilarity Toimprove,machine Time-consuming,labor-intensivetobuildalarge-scaletrainingDefinitionDefinition1.LetUbethesetofentitiesinasetDofdatasources.Given,theentitylinkageforuistoqueryaofforwhicharelationεwhereεlinksalltheentitiesinUthatrefertothesameobjectasudoes,arecoreferentwithHowtocombine?Oursolution: Query-drivenentity UseSearch/browsing–asystemknows“whattolink”onlyatqueryyzesmallportionsofaverylargedatasettoansweron-demandOurAutomaticallyinfersemanticallyentitiesbasedonOWL/SKOS

Output:aof

an

1Builda(Initializetraining

LabeledSomepropertiestousetogether

External

LearnUnresolved

Assumptions:(1)coreferententitiessharesimilarproperty-valuepairs;(2)afewproperty-valuepairsaremoreimportantforlinkingentitiesRunning

“Nanjing“32N“118E“Nanjing“Nan-ching”“Nanjing”“32N“118E“117W“32NSome Discriminabilityofaproperty Property Non-coreferententity intermsofcoreferent Discriminabilityofavalue Discriminabilityofaprop-value>100>100RDF>2Same-asIFPFP2 BillionTriplesChallenge(BTC) Testing Top-50in364thousandquery8Music/54323 Evaluationprocedureand 30graduates,2judges+1arbitrator/link,Fleiss’sκ=0.8(sufficient Precision&relativerecall RR=correctlinksinonesystem/totalcorrectuniquelinksinall umiteration= Discriminabilitythreshold= Linkage Runningtimeon5,000samples:avg.11.3linksin OntologyAlignmentEvaluation ISWCworkshopsincen Ontologymatching&instancematching提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Metadataisvitaltomultimediacontent Search,browsing,management Large-scaleLODarepublishedand Makeuseofsuchrichsourceof Existingmultimediametadatamodelsanddonotprovideformaltypicallyfocusonasinglemedia EXIF patiblewithMPEG- Differentmediatypesco-existinamultimedia Amoviemayhaveathememusicanda Aunified,well-definedontology(withits stoothers)neededtogain Challenge:LinkandintegrateheterogeneousAmotivingBeautyandthe Low-levelmetadata:runtime,location LOD:LinkedMDB,DBpedia Differentontologies(terms),different linkedmdb:filmlinkedmdb:directorlinkedmdb:11264 "BeautyandtheBeast"

andentitylinkageBeautyBeautyandtheruntime"91min." location

"BeautyandtheBeast" "...isa1991Americananimated

"BeautyandtheBeast"

Our CAMO:enrichmultimediametadataviaintegratingSelectDBpediaasthemediationandmatchwithLinkDBpediaentitieswithother andaggregatetheirIncorporatelegacyrelationaldatabases Moreover,provideamobileappforbrowsingandmultimediacontentonAndroid AssesstheadvantagesofintegratingLODintomultimediaSystemClient-ServerServer TheDBpedia3.6ontologyas Global-as-Viewsolutionof Music:DBpedia,DBTune, Movie:DBpedia,LinkedMDB,

Client Android-basedmobile Integratewithamultimedia Search&browsemultimediaSystemSearch,browseand

InstancemobileInstanceJohnrelationalEntityOntologyDataMatchingontologieswith DifferentLODsourceshavedifferentpreferenceson DBpedia,Musicontology Falcon-AO:anautomaticontologymatching Extend knowledgetosupportsynonym trackvs. StructuralStructural

4 Linguisticmatching:V-Doc(TF-IDF)&I-Sub(edit Structuralmatching:GMO(similarityLinkingentitieswith EntitylinkagehelpsmergealldescriptionsindifferentsourcesthattothesamemultimediaTrainingTraining

2

{p1,{p1,p3}?c1vs.{p5,p6}?c3{p1,p2}?c1vs.{p3,p4}?c2Instancelinkage?

Trainingset Negativeexamples:donotholdequivalencerelation Class-baseddiscriminativeproperty Information OnlineIntegratinglegacyrelational Therearestillagreatdealoflegacydatastoredin SomedatainLODaregeneratedfromtheirrelational123123 Element e.g.,entitytableandrelationship Element Instance

similartoontologymatchingandentitylinkage TwoUsabilityandeffectivenessofthemobileIntegrationaccuracyinthe User(1)Usability& 3comparative : : :WikipediaAndroid 6testing 50 10 22 18Usability& SystemUsabilityScale(SUS)&post-task

Post-task yzetheresultaccordingtothetypologyoftheIntegrationOntology 78 incl.18RDB

Entity 60thousand 100samplesper10110Lessons CAMOleveragesontologymatchingandentitylinkagefordataintegrationandsupportsuserstobrowseandsearchmultimediacontentonmobiledevices LessonsOntologymatters:trade-offbetweenexpressivenessandeaseofDataintegrationquality:humancomputation+machineMobileappdesign:conciseness,rankingscheme,user- FutureGeneratecomplex sforsemanticqueryExtendtouser-generatedNLP提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicati

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論