知識(shí)圖譜構(gòu)建技術(shù)-北理工_第1頁(yè)
知識(shí)圖譜構(gòu)建技術(shù)-北理工_第2頁(yè)
知識(shí)圖譜構(gòu)建技術(shù)-北理工_第3頁(yè)
知識(shí)圖譜構(gòu)建技術(shù)-北理工_第4頁(yè)
知識(shí)圖譜構(gòu)建技術(shù)-北理工_第5頁(yè)
已閱讀5頁(yè),還剩52頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

百科和佛學(xué)知識(shí)圖譜構(gòu)建技術(shù)介紹

漆桂林東南大學(xué)認(rèn)知智能研究所Schedule

of

My

Talk百科知識(shí)圖譜構(gòu)建技術(shù)佛學(xué)知識(shí)圖譜構(gòu)建技術(shù)IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student

enrollee

person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle

Knowledge

Graph

(KG)It

isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal

definition:

a

knowledge

graph

is

a

knowledge

base

with

graph

structure,

where

the

nodes

are

instances

or

concepts,

and

edges

are

relations

between

themIt

is

a

special

semantic

networkIt

belongs

to

knowledge

engineering中興通訊上市公司非上市公司子公司中興康訊Acacia(IPO中)卓翼科技美國(guó)高通共進(jìn)股份宇順電子美國(guó)博通供應(yīng)商客戶競(jìng)爭(zhēng)對(duì)手合作伙伴中國(guó)移動(dòng)英特爾華為中國(guó)聯(lián)通大富科技華星創(chuàng)業(yè)盛路通信超聲電子ExampleKG

and

Semantic

Search

Go

deeper

and

broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge

Graph

(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies

ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow

has

over

10

milliondistinctinstancesand200millionRDFtriples,

and

can

be

accessed

by

online

API,

lookup

service

and

SPARQL

endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名稱“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,

e.g.(baidu:北京,Zh-wiki:北京市).From

Haofen

WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊貓”baidu:標(biāo)簽hudong:中文學(xué)名“Ailuropodamelanoleuca”baidu:拉丁學(xué)名hudong:二名法“白鰭豚”baidu:標(biāo)簽hudong:中文學(xué)名“桂花”baidu:標(biāo)簽hudong:中文學(xué)名………EntityMatchingFrom

Haofen

WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:標(biāo)簽)=valueOf(hudong:中文學(xué)名)andvalueOf(baidu:拉丁學(xué)名)=valueOf(hudong:二名法)andvalueOf(baidu:綱)=valueOf(hudong:綱)EntityMatchingFrom

Haofen

WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom

Haofen

WangType

InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication

scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s

republicofChina?Answer:Moyan.Howtogettheanswer?

MoyanInstanceOf

Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th

JointInternationalSemanticTechnologyConferenceApproach

InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”

The4th

JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

Givenanexample:

Giventhearticlepagesof“China”inBaiduBaike,Hudong

BaikeandChineseWikipedia,itscategoriesareasfollows:

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

Butanotherproblemis:categoryattributesarenotabundantlyavailable.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:

GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:

Rankcandidatetypeswithagraph-basedrandomwalkmethod

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject邁克爾·喬丹instanceOf籃球運(yùn)動(dòng)員MichaelJeffreyJordanBasketballPlayer

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection

GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead

withPOStagging.

Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江蘇學(xué)校(schoolinJiangSu)subclassOf中國(guó)學(xué)校(schoolinChina)The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes

andattributesininfoboxofarticlepagesasinstanceattributes.

WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.

WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRanker

Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me

http://zhishi.me/apiSchedule

of

My

Talk百科知識(shí)圖譜構(gòu)建佛學(xué)知識(shí)圖譜構(gòu)建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工觀察百科中與佛教人物相關(guān)的分類抽取佛教人物分類下所有文章對(duì)應(yīng)的實(shí)體命名規(guī)則方法

例:

“.+菩薩”“.+禪師”維基百科“佛教頭銜”分類下的所有實(shí)體已抽取出的實(shí)體名中高頻的公共字符串KnowledgeFusion主語(yǔ)融合實(shí)體的“別名”屬性和重定向作為實(shí)體的別名集合不同來(lái)源的實(shí)體存在一個(gè)完全匹配的別名則認(rèn)為是相同實(shí)體人工檢查相同實(shí)體數(shù)多于三個(gè)的映射百度百科:互動(dòng)百科:維基百科:{確吉堅(jiān)贊,班禪額爾德尼·確吉堅(jiān)贊,羅桑赤烈倫珠}{班禪額爾德

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論