版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
百科和佛學(xué)知識(shí)圖譜構(gòu)建技術(shù)介紹
漆桂林東南大學(xué)認(rèn)知智能研究所Schedule
of
My
Talk百科知識(shí)圖譜構(gòu)建技術(shù)佛學(xué)知識(shí)圖譜構(gòu)建技術(shù)IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student
enrollee
person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle
Knowledge
Graph
(KG)It
isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal
definition:
a
knowledge
graph
is
a
knowledge
base
with
graph
structure,
where
the
nodes
are
instances
or
concepts,
and
edges
are
relations
between
themIt
is
a
special
semantic
networkIt
belongs
to
knowledge
engineering中興通訊上市公司非上市公司子公司中興康訊Acacia(IPO中)卓翼科技美國(guó)高通共進(jìn)股份宇順電子美國(guó)博通供應(yīng)商客戶競(jìng)爭(zhēng)對(duì)手合作伙伴中國(guó)移動(dòng)英特爾華為中國(guó)聯(lián)通大富科技華星創(chuàng)業(yè)盛路通信超聲電子ExampleKG
and
Semantic
Search
Go
deeper
and
broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge
Graph
(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies
ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow
has
over
10
milliondistinctinstancesand200millionRDFtriples,
and
can
be
accessed
by
online
API,
lookup
service
and
SPARQL
endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名稱“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,
e.g.(baidu:北京,Zh-wiki:北京市).From
Haofen
WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊貓”baidu:標(biāo)簽hudong:中文學(xué)名“Ailuropodamelanoleuca”baidu:拉丁學(xué)名hudong:二名法“白鰭豚”baidu:標(biāo)簽hudong:中文學(xué)名“桂花”baidu:標(biāo)簽hudong:中文學(xué)名………EntityMatchingFrom
Haofen
WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:標(biāo)簽)=valueOf(hudong:中文學(xué)名)andvalueOf(baidu:拉丁學(xué)名)=valueOf(hudong:二名法)andvalueOf(baidu:綱)=valueOf(hudong:綱)EntityMatchingFrom
Haofen
WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom
Haofen
WangType
InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication
scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s
republicofChina?Answer:Moyan.Howtogettheanswer?
MoyanInstanceOf
Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th
JointInternationalSemanticTechnologyConferenceApproach
InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”
The4th
JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
Givenanexample:
Giventhearticlepagesof“China”inBaiduBaike,Hudong
BaikeandChineseWikipedia,itscategoriesareasfollows:
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
Butanotherproblemis:categoryattributesarenotabundantlyavailable.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:
GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:
Rankcandidatetypeswithagraph-basedrandomwalkmethod
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject邁克爾·喬丹instanceOf籃球運(yùn)動(dòng)員MichaelJeffreyJordanBasketballPlayer
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection
GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead
withPOStagging.
Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江蘇學(xué)校(schoolinJiangSu)subclassOf中國(guó)學(xué)校(schoolinChina)The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes
andattributesininfoboxofarticlepagesasinstanceattributes.
WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.
WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRanker
Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me
http://zhishi.me/apiSchedule
of
My
Talk百科知識(shí)圖譜構(gòu)建佛學(xué)知識(shí)圖譜構(gòu)建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工觀察百科中與佛教人物相關(guān)的分類抽取佛教人物分類下所有文章對(duì)應(yīng)的實(shí)體命名規(guī)則方法
例:
“.+菩薩”“.+禪師”維基百科“佛教頭銜”分類下的所有實(shí)體已抽取出的實(shí)體名中高頻的公共字符串KnowledgeFusion主語(yǔ)融合實(shí)體的“別名”屬性和重定向作為實(shí)體的別名集合不同來(lái)源的實(shí)體存在一個(gè)完全匹配的別名則認(rèn)為是相同實(shí)體人工檢查相同實(shí)體數(shù)多于三個(gè)的映射百度百科:互動(dòng)百科:維基百科:{確吉堅(jiān)贊,班禪額爾德尼·確吉堅(jiān)贊,羅桑赤烈倫珠}{班禪額爾德
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 合并同類項(xiàng)解元次方程說(shuō)課稿
- 電商項(xiàng)目規(guī)劃
- 美術(shù)館黃金屋租賃合同
- 休閑娛樂(lè)場(chǎng)所消火栓施工合同
- 會(huì)計(jì)師事務(wù)所出納人員聘用協(xié)議
- 老年大學(xué)教師勞動(dòng)合同范本
- 羽絨制品維修工聘用合同模板
- 農(nóng)藥采購(gòu)合同管理
- 電梯工程師招聘協(xié)議
- 房地產(chǎn)開發(fā)招標(biāo)廉政責(zé)任
- GB/T 10001.4-2021公共信息圖形符號(hào)第4部分:運(yùn)動(dòng)健身符號(hào)
- 我最喜歡的建筑
- 《競(jìng)爭(zhēng)法學(xué)》課程教學(xué)大綱
- 疼痛的評(píng)估方法課件
- 修剪指甲培智五年級(jí)上冊(cè)生活適應(yīng)教案
- 計(jì)算機(jī)信息系統(tǒng)災(zāi)難恢復(fù)計(jì)劃(完整版)資料
- 煙花爆竹安全管理?xiàng)l例課件
- 大學(xué)C語(yǔ)言設(shè)計(jì)冒泡排序和選擇排序課件
- 一些液體的導(dǎo)熱系數(shù)
- 高毒力肺炎克雷伯菌感染
- 篆刻學(xué)ppt精品課件
評(píng)論
0/150
提交評(píng)論