版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
知識(shí)圖譜的集成 計(jì)算機(jī)科學(xué)與軟件新技術(shù)國CCKS2016講習(xí)班,提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata 2Semantic SemanticWebwasathoughtfromTimBerners- GiveformalmeaningstoWebinformation– Web1.0(page)àWeb2.0(social)àWeb3.0(awebof SemanticWebiscommonformats integrationandcombinationofdrawnfromdiverselanguages recordinghowthedatarelatestoreal-worldobjects3 RDF
謂主 賓
LayerTheworldisnotmadeofstrings,butismadeofthings4Linkeddata 數(shù)據(jù)/關(guān)聯(lián)數(shù)據(jù) AsarealizationofSemantic LinkedDatareferstoacollectionofinterrelated Usedforlarge-scaleintegrationof,reasoningon,dataonthe LinkeddataUseURIstonameUseHTTPURIs(canbeProvideusefulinformationusingopenWebstandards(e.g.Includelinkstootherrelated5Linkedopendata(LOD)1,000+
lifesocial
6Knowledge KnowledgeGraphisaknowledgebaseusedby toenhanceitssearchengine’ssearchresultswithsemantic-searchinformationgatheredfromawidevarietyofsources?知識(shí)圖譜是使用的一個(gè)知識(shí)庫, 亦可看作是一張巨大的圖,節(jié)點(diǎn)表實(shí)體或概念,邊則由屬性或關(guān)系 除了關(guān) (部分)真實(shí)世界的一個(gè)模 引入領(lǐng)域相關(guān)的 指定術(shù)語的含義(語義 使用合適的邏輯來形 描述 HeartisamuscularorganispartofthecirculatoryI.Horrocks.Ontologiesandthesemanticweb:thestorysofar. 大規(guī)模知識(shí)庫/圖譜規(guī)英文:4百萬個(gè)實(shí)體,5億個(gè)RDF三元125種1千萬個(gè)實(shí)體,1.2億個(gè)RDF三元4千萬個(gè)實(shí)體,10億個(gè)RDF三元 知識(shí)圖譜6億個(gè)實(shí)體,35億條RDF三元WolframAlpha計(jì)算知識(shí)引擎,CMUNELL,知心,搜狗知立9知識(shí)圖譜的技術(shù)族知識(shí)體已有知識(shí) 知識(shí)圖譜提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Sincelonglongtimes SyntacticSchema- e.g.,“WeiHu”vs.Schema- Terminological e.g.,“notebook”vs.Data-entityData-entity Pragmatic OntheSemantic Datahasexplicitsemantics,richlinks,Ontology Thepopularityofontologiesisrapidlygrowing,andthenumberofontologiescontinuesincreasing Ontology Theprocessofdeterminingcorrespondencesbetween 本體匹配即發(fā)現(xiàn)一個(gè)三元組????????>,包括一個(gè)源本體??,一個(gè)目標(biāo)本體??’,以及一個(gè)映射單元的集合??={??1??2????}。其中,????表示一個(gè)基本的映射元,可以寫成????=<????,??????>的四元 ????為映射單元的標(biāo)識(shí)符,用于唯一標(biāo)識(shí)該四元 ??,??’分別為??,??’中的術(shù) ??表示????’之間的相似度,滿足??//另外,可以有??表示??,??’之間的關(guān)系,常見的關(guān)系有等本體匹配:消除模式 (驅(qū)動(dòng)的)Stateofthe語言學(xué)特征 本體中術(shù)語的語言學(xué)描 本地名(localnForanameNinanamespaceidentifiedbyaURII,thenamespacenameisI.ForanameNthatisnotinanamespace,thenamespacenamehasnovalue.Definition:IneithercasethelocalnameisN.n -->local 注釋 其他:foaf:name、dc:title語言學(xué)特征 本體語言學(xué)特征使用現(xiàn)狀的調(diào) 本地名使用多,有一些 注 鄰居未充 詞典查詢耗√√√√√√類√√機(jī)器學(xué)√排序、S-類√ Edit 指兩個(gè)字串之間,由一個(gè)轉(zhuǎn)成另一個(gè)所需的最少編輯操作次 編輯操作包括替換、插入、刪 一般來說,編輯距離越小,兩個(gè)字串的相似度越 I-Sub:??????(??1,??2)=????????(??1,??2)?????????(??1,??2)+??????????????(??1, biggestcommonsubstringtwo thelengthofunmatchedresultedfrominitialmatching 術(shù)語的語言學(xué)描述:本地名 、注 結(jié)點(diǎn)的語言學(xué)描述:前向鄰居的語言學(xué)描 術(shù)語的鄰居:主語鄰居、謂語鄰居、賓語鄰 術(shù)語的虛擬文檔:自身+???????? =???????? +??3???????? +??1 向量空間模型:TF-Stringsimilaritymetrics Lessthantwowordsperlabel:Jaro- Twoormorewordsper Synonyms:SoftJaccard,withLevensteinbase Nosynonyms:SoftJaccard,withLevensteinbase Lessthantwowordsperlabel:TF- Twoormorewordsper Synonyms:SoftTF-IDF,withJaro-Winklerbase DifferentLanguages:SoftTF-IDF,withJaro-Winklerbase Other:SoftTF-IDF,withJaro-Winklerbase結(jié)構(gòu)特征 Intuition:termsoftwodistinctontologiesaresimilarwhenadjacenttermsarennSimilarity?^_`??, =?^??, +
ij,k,lcl,k,ir
?^(??e,??e)g??(??e,??e,(??,?^(??q,??q)g??(??q,??q,(??,實(shí)例數(shù)據(jù) Machine Jointprobability Instance Content Name Meta Relaxation
搜索引擎 distance sbetween -basedsimilarity????????, =maxlog????,log?? ?log??(??,log???min{log????,log ?? isthenumber hitsforthesearchterm ?? isthenumber hitsforthesearchterm ????, isthenumber hitsforthetupleofsearchterms?? ??isthenumberofwebpagesindexed (??≈10`x)Ontologymatching Falcon- New Alotof(semi-)automaticalgorithmsand Mostareonlyapplicableforsmall ManyapplicationsrequirematchingBIG Medicineandbiology:GALEN,FMA, Agricultureandfood:AGROVOC, Librarycollections:Brinkman, Commonknowledge:DBpedia,
≥10K Adivide-and-conquer1.ontologypartitioningà2.blockmatchingà3.termRunningNewdirectionsnHolisticontologynIncreasingamountofdataàsimultaneouslymatchingnInput:asetΩ={??1,…,????}ofontologieswith??>2nOutput:??=??12∪??13∪??23∪?nGuaranteetofindalwaysthesameAglobaloptimal Limitationofpairwise ??isconsideredasalocalsolutiondependingoftheorderwhichtheontologymatchingiscarried e.g.??12∪??`}~≠??13∪??`~}≠??23∪??}~Holisticontology Extending um-weightedgraphmatchingproblemwithconstraints(cardinality,structuralandcoherence Threetypesof Class,objectproperty,data Representvirtualconnectionsbetweenthesametypesof Haveweightstorepresentsimilaritiesbetweenthe Correspondences(1:1)with umweight?à Linearconstraints:binary Classdecision disjoint 提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Entity SemanticWebdatahavereachedascaleinbillionsof Manydifferententitiesrefertothesamereal-world TypicallydenotedbyURIs,fromdistributeddata e.g.Wei? Entitylinkage:linkdifferententitiesthatrefertothesame a.k.a.coreferenceresolution,entitymatching,recordlinkage Theentitymatchingproblemwasoriginallydefinedin1959by beetal.andwasformalizedbyFelligiandSuntertenyearslater Outof31BRDFstatements,lessthan500Marelinksacross 實(shí) 的識(shí) 數(shù) 的消 消除描述這些標(biāo)識(shí)符RDF數(shù)據(jù)之StateStateofthe Stateofthe InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current owl:sameAs,inversefunctionalpropertiesSimilaritycomputation(alsointhedatabase ComparepropertiesandvaluesofEquivalence AnRDFtriple:???,??,???∈(??∪??)×??×(??∪??∪ Same-asrelation: ???,owl:sameAs,???à???,???∈??and???,???∈ Inversefunctionalproperty(IFP)relation: IFP:avaluecanonlybethevalueofthispropertyforasingle e.g.,??1,foaf:mbox,??,??2,foaf:mbox,??à???1,??2?∈??and???2,??1?∈ Functionalproperty(FP)relation: Cardinalityrelation: owl:cardinality/owl:maxCardinality= ??=??∪??∪??∪??+,??isanequivalenceSimilarity Similarity LinkSimilarity 問題一般為以下形??,?? ????????,??>??,??∈??,??∈?? ??和??是兩個(gè)字符串集合,??是相似度 時(shí)間復(fù)雜度為:??(??}??}) 現(xiàn)有的常規(guī)的方法是“過濾—驗(yàn)證”框 過濾階段:使用各種過濾方法縮小候選集大 常見方 All-Pairs,ED-Join,PPJoin,PassJoin Na?vepairwise:??}pairwise 1,000businesslistingseachfrom1,000differentcitiesacrossthe 1trillioncomparisons,11.6days(ifeachcomparisonis1 Mentionsfromdifferentcitiesareunlikelytobe Blockingcriterion: 1billioncomparisons,16minutes(ifeachcomparisonis1 Hashbased Pairwisesimilarity/neighborhoodbasedblocking Simpleblocking:invertedMachine Alinkage Learning Genetic ActiveSelectslinkcandidatestobelabeledbyaAhumanexpertlabelstheselectedlinkascorrectorincorrectThegeneticprogrammingalgorithmevolvesthepopulationoflinkagerules InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current Atpresent,probablymissmanypotentialSimilarity Toimprove,machine Time-consuming,labor-intensivetobuildalarge-scaletrainingDefinitionDefinition1.LetUbethesetofentitiesinasetDofdatasources.Given,theentitylinkageforuistoqueryaofforwhicharelationεwhereεlinksalltheentitiesinUthatrefertothesameobjectasudoes,arecoreferentwithHowtocombine?Oursolution: Query-drivenentity UseSearch/browsing–asystemknows“whattolink”onlyatqueryyzesmallportionsofaverylargedatasettoansweron-demandOurAutomaticallyinfersemanticallyentitiesbasedonOWL/SKOS
Output:aof
an
1Builda(Initializetraining
LabeledSomepropertiestousetogether
External
LearnUnresolved
Assumptions:(1)coreferententitiessharesimilarproperty-valuepairs;(2)afewproperty-valuepairsaremoreimportantforlinkingentitiesRunning
“Nanjing“32N“118E“Nanjing“Nan-ching”“Nanjing”“32N“118E“117W“32NSome Discriminabilityofaproperty Property Non-coreferententity intermsofcoreferent Discriminabilityofavalue Discriminabilityofaprop-value>100>100RDF>2Same-asIFPFP2 BillionTriplesChallenge(BTC) Testing Top-50in364thousandquery8Music/54323 Evaluationprocedureand 30graduates,2judges+1arbitrator/link,Fleiss’sκ=0.8(sufficient Precision&relativerecall RR=correctlinksinonesystem/totalcorrectuniquelinksinall umiteration= Discriminabilitythreshold= Linkage Runningtimeon5,000samples:avg.11.3linksin OntologyAlignmentEvaluation ISWCworkshopsincen Ontologymatching&instancematching提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Metadataisvitaltomultimediacontent Search,browsing,management Large-scaleLODarepublishedand Makeuseofsuchrichsourceof Existingmultimediametadatamodelsanddonotprovideformaltypicallyfocusonasinglemedia EXIF patiblewithMPEG- Differentmediatypesco-existinamultimedia Amoviemayhaveathememusicanda Aunified,well-definedontology(withits stoothers)neededtogain Challenge:LinkandintegrateheterogeneousAmotivingBeautyandthe Low-levelmetadata:runtime,location LOD:LinkedMDB,DBpedia Differentontologies(terms),different linkedmdb:filmlinkedmdb:directorlinkedmdb:11264 "BeautyandtheBeast"
andentitylinkageBeautyBeautyandtheruntime"91min." location
"BeautyandtheBeast" "...isa1991Americananimated
"BeautyandtheBeast"
Our CAMO:enrichmultimediametadataviaintegratingSelectDBpediaasthemediationandmatchwithLinkDBpediaentitieswithother andaggregatetheirIncorporatelegacyrelationaldatabases Moreover,provideamobileappforbrowsingandmultimediacontentonAndroid AssesstheadvantagesofintegratingLODintomultimediaSystemClient-ServerServer TheDBpedia3.6ontologyas Global-as-Viewsolutionof Music:DBpedia,DBTune, Movie:DBpedia,LinkedMDB,
Client Android-basedmobile Integratewithamultimedia Search&browsemultimediaSystemSearch,browseand
InstancemobileInstanceJohnrelationalEntityOntologyDataMatchingontologieswith DifferentLODsourceshavedifferentpreferenceson DBpedia,Musicontology Falcon-AO:anautomaticontologymatching Extend knowledgetosupportsynonym trackvs. StructuralStructural
4 Linguisticmatching:V-Doc(TF-IDF)&I-Sub(edit Structuralmatching:GMO(similarityLinkingentitieswith EntitylinkagehelpsmergealldescriptionsindifferentsourcesthattothesamemultimediaTrainingTraining
2
{p1,{p1,p3}?c1vs.{p5,p6}?c3{p1,p2}?c1vs.{p3,p4}?c2Instancelinkage?
Trainingset Negativeexamples:donotholdequivalencerelation Class-baseddiscriminativeproperty Information OnlineIntegratinglegacyrelational Therearestillagreatdealoflegacydatastoredin SomedatainLODaregeneratedfromtheirrelational123123 Element e.g.,entitytableandrelationship Element Instance
similartoontologymatchingandentitylinkage TwoUsabilityandeffectivenessofthemobileIntegrationaccuracyinthe User(1)Usability& 3comparative : : :WikipediaAndroid 6testing 50 10 22 18Usability& SystemUsabilityScale(SUS)&post-task
Post-task yzetheresultaccordingtothetypologyoftheIntegrationOntology 78 incl.18RDB
Entity 60thousand 100samplesper10110Lessons CAMOleveragesontologymatchingandentitylinkagefordataintegrationandsupportsuserstobrowseandsearchmultimediacontentonmobiledevices LessonsOntologymatters:trade-offbetweenexpressivenessandeaseofDataintegrationquality:humancomputation+machineMobileappdesign:conciseness,rankingscheme,user- FutureGeneratecomplex sforsemanticqueryExtendtouser-generatedNLP提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicati
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 建筑照明拆除施工合同
- 音樂制作土地廠房租賃合同范本
- 鄭州商鋪交易合同關(guān)鍵條款
- 建筑材料招標(biāo)授權(quán)書
- 星巴克加盟合作租賃協(xié)議
- 鐵路工程合同管理實(shí)施細(xì)則
- 辦公空間綠色環(huán)保裝修合同
- 教育機(jī)構(gòu)會(huì)計(jì)招聘合同樣本
- 防腐保溫施工合同
- 2025年度智能交通系統(tǒng)施工合同3篇
- 餐廳飯店顧客意見反饋表格模板(可修改)
- 頌缽培訓(xùn)課件
- 石油形成過程科普知識(shí)講座
- 輔警心理健康知識(shí)講座
- 《棗樹常見病蟲害》課件
- 刑法試題庫大全
- 燃?xì)獍惭b人員管理制度
- 省份簡稱課件
- 公民科學(xué)素質(zhì)調(diào)查問卷
- 小學(xué)健康教育試題-及答案
- 鋼構(gòu)件應(yīng)力超聲檢測技術(shù)規(guī)程
評(píng)論
0/150
提交評(píng)論