知識圖譜梳理專題培訓(xùn)課件_第1頁
知識圖譜梳理專題培訓(xùn)課件_第2頁
知識圖譜梳理專題培訓(xùn)課件_第3頁
知識圖譜梳理專題培訓(xùn)課件_第4頁
知識圖譜梳理專題培訓(xùn)課件_第5頁
已閱讀5頁,還剩49頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

知識圖譜架構(gòu)知識圖譜一般架構(gòu):[來源自百度百科]復(fù)旦大學(xué)知識圖譜架構(gòu):早期知識圖譜架構(gòu)知識圖譜架構(gòu)知識圖譜一般架構(gòu):[來源自百度百科]1知識圖譜一般架構(gòu):[來源自百度百科]知識圖譜一般架構(gòu):[來源自百度百科]2知識圖譜梳理專題培訓(xùn)課件3架構(gòu)討論早期知識圖譜架構(gòu)架構(gòu)討論早期知識圖譜架構(gòu)4知識抽取實體概念抽取實體概念映射關(guān)系抽取質(zhì)量評估知識抽取實體概念抽取5KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014A

sampler

of

research

problems?????????????Growth:

knowledge

graphs

are

incomplete!

Link

prediction:

add

relations

Ontology

matching:

connect

graphs

Knowledge

extraction:

extract

new

entities

and

relations

from

web/textValidation:

knowledge

graphs

are

not

always

correct!

Entity

resolution:

merge

duplicate

entities,

split

wrongly

merged

ones

Error

detection:

remove

false

assertionsInterface:

how

to

make

it

easier

to

access

knowledge?

Semantic

parsing:

interpret

the

meaning

of

queries

Question

answering:

compute

answers

using

the

knowledge

graphIntelligence:

can

AI

emerge

from

knowledge

graphs?

Automatic

reasoning

and

planning

Generalization

and

abstraction9KDD2014TutorialonConstruct6關(guān)系抽取定義:常見手段:語義模式匹配[頻繁模式抽取,基于密度聚類,基于語義相似性]層次主題模型[弱監(jiān)督]關(guān)系抽取定義:7KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Methods

and

techniques???Supervised

modelsSemi-supervised

modelsDistant

supervision2.

Entity

resolution?Single

entity

methods?Relational

methods3.

Link

prediction????Rule-based

methodsProbabilistic

modelsFactorization

methodsEmbedding

models80Notinthistutorial:

?Entityclassification?Group/expertdetection?Ontologyalignment?Objectranking 1.Relationextraction:KDD2014TutorialonConstruct8KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014?

Extracting

semantic

relations

between

sets

of

[grounded]

entities?Numerous

variants:?????Undefined

vs

pre-determined

set

of

relationsBinary

vs

n-ary

relations,

facet

discoveryExtracting

temporal

informationSupervision:

{fully,

un,

semi,

distant}-supervisionCues

used:

only

lexical

vs

full

linguistic

features82Relation

Extraction

Kobe

BryantLA

LakersplayForthe

franchise

player

ofonce

again

savedman

of

the

match

forthe

Lakers”his

team”Los

Angeles”“KobeBryant,“Kobe“KobeBryant?KDD2014TutorialonConstruct9KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Supervised

relation

extraction?Sentence-level

labels

of

relation

mentions??"Apple

CEO

Steve

Jobs

said.."

=>

(SteveJobs,

CEO,

Apple)"Steve

Jobs

said

that

Apple

will.."

=>

NIL?Traditional

relation

extraction

datasets???ACE

2004MUC-7Biomedical

datasets

(e.g

BioNLP

clallenges)??Learn

classifiers

from

+/-

examplesTypical

features:

context

words

+

POS,

dependency

path

betweenentities,

named

entity

tags,

token/parse-path/entity

distance83KDD2014TutorialonConstruct10KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Semi-supervised

relation

extraction?Generic

algorithm(遺傳算法)1.2.3.4.5.Start

with

seed

triples

/

golden

seed

patternsExtract

patterns

that

match

seed

triples/patternsTake

the

top-k

extracted

patterns/triplesAdd

to

seed

patterns/triplesGo

to

2?????Many

published

approaches

in

this

category:

Dual

Iterative

Pattern

Relation

Extractor

[Brin,

98]

Snowball

[Agichtein

&

Gravano,

00]

TextRunner

[Banko

et

al.,

07]

almost

unsupervisedDiffer

in

pattern

definition

and

selection86KDD2014TutorialonConstruct11founderOfKDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantly-supervised

relation

extraction88???Existing

knowledge

base

+

unlabeled

text

generate

examples

Locate

pairs

of

related

entities

in

text

Hypothesizes

that

the

relation

is

expressedGoogle

CEO

Larry

Page

announced

that...Steve

Jobs

has

been

Apple

for

a

while...Pixar

lost

its

co-founder

Steve

Jobs...I

went

to

Paris,

France

for

the

summer...GoogleCEO

capitalOfLarryPageFrance

AppleCEO

PixarSteve

JobsfounderOfKDD2014Tutorialon12Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

89KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantsupervision:modeling13KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Sentence-level

features●●●●●Lexical:

words

in

between

and

around

mentions

and

their

parts-of-speech

tags

(conjunctive

form)Syntactic:

dependency

parse

path

between

mentions

along

withside

nodesNamed

Entity

Tags:

for

the

mentionsConjunctions

of

the

above

features

Distant

supervision

is

used

on

to

lots

of

data

sparsity

of

conjunctive

forms

not

an

issue92KDD2014TutorialonConstruct14Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.2.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]At

least

one

sentence

mentioning

them

express

it

[Riedel

et

al.,

10]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

“Obama

flew

back

to

the

US

on

Wednesday.”

(BO,

employedBy,

USA)

95KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantsupervision:modeling15Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.2.3.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]At

least

one

sentence

mentioning

them

express

it

[Riedel

et

al.,

10]At

least

one

sentence

mentioning

them

express

it

and

2

entities

can

express

multiple

relations

[Hoffmann

et

al.,

11]

[Surdeanu

et

al.,

12]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

“Obama

flew

back

tothe

US

justWednesday.”

said.”

employedBy,

USA)

98KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014was

born

in

on

he

always

(BO,

(BO,

bornIn,Distantsupervision:modeling16KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distant

supervision?Pros???Can

scale

to

the

web,

as

no

supervision

requiredGeneralizes

to

text

from

different

domainsGenerates

a

lot

more

supervision

in

one

iteration?Cons??Needs

high

quality

entity-matchingRelation-expression

hypothesis

can

be

wrongCan

be

compensated

by

the

extraction

model,

redundancy,

language

model?Does

not

generate

negative

examplesPartially

tackled

by

matching

unrelated

entities101KDD2014TutorialonConstruct17KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014104

KobeBryantGasolteammatebornInplayInLeague

BlackMambaEntity

resolution

LA

Lakers

playFor

playFor

Pau35ageKobeB.

BryantVanessaL.BryantmarriedTo

1978Single

entity

resolutionRelational

entity

resolutionKDD2014TutorialonConstruct18DEF:Weconsidertheentityresolution(ER)problem(alsoknownasdeduplication,ormerge–purge),inwhichrecordsdeterminedtorepresentthesamereal-worldentityaresuccessivelylocatedandmergedtheproblemofextracting,matching

andresolvingentitymentionsinstructuredandunstructured

dataMethodsEntityresolution/deduplication ?Multiplementionsofthesameentityiswrongandconfusing.DEF:Entityresolution/dedupl19KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Single-entity

entity

resolution??????????Entity

resolution

without

using

the

relational

context

of

entitiesMany

distances/similarities

for

single-entity

entity

resolution:

Edit

distance

(Levenshtein,

etc.)

Set

similarity

(TF-IDF,

etc.)

Alignment-based

Numeric

distance

between

values

Phonetic

Similarity

Equality

on

a

boolean

predicate

Translation-based

Domain-specific105KDD2014TutorialonConstruct20KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Relational

entity

resolution

Simple

strategies

?

Enrich

model

with

relational

features

richer

context

for

matching?Relational

features:??Value

of

edge

or

neighboring

attributeSet

similarity

measures?????Overlap/JaccardAverage

similarity

between

set

membersAdamic/Adar:

two

entities

are

more

similar

if

they

share

more

items

that

areoverall

less

frequentSimRank:

two

entities

are

similar

if

they

are

related

to

similar

objectsKatz

score:

two

entities

are

similar

if

they

are

connected

by

shorter

paths114

KobeBryant1978teammatebornInplayForplayInLeague

BlackMamba

LA

LakersplayFor35agePauGasolKDD2014TutorialonConstruct21KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014

KobeBryant1978teammatebornInplayForplayInLeague

BlackMamba

LA

LakersplayFor

35agePauGasolRelational

entity

resolution

Advanced

strategies?????Dependency

graph

approaches

[Dong

et

al.,

05]Relational

clustering

[Bhattacharya

&

Getoor,

07]Probabilistic

Relational

Models

[Pasula

et

al.,

03]Markov

Logic

Networks

[Singla

&

Domingos,

06]Probabilistic

Soft

Logic

[Broecheler

&

Getoor,

10]115KDD2014TutorialonConstruct22KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014LINK

PREDICTION116KDD2014TutorialonConstruct23KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014117

KobeBryantLink

prediction

NY

Knicks

PauGasolteammateplayInLeagueteamInLeagueopponentplayForLA

Lakers

playFor

?

Add

knowledge

from

existing

graph?

No

external

source

?

Reasoning

within

the

graph1.

Rule-based

methods2.

Probabilistic

models3.

Factorization

models4.

Embedding

modelsKDD2014TutorialonConstruct24KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014First

Order

Inductive

Learner

?

FOIL

learns

function-free

Horn

clauses:???118Gasolgiven

positive

negative

examples

of

a

concepta

set

of

background-knowledge

predicatesFOIL

inductively

generates

a

logical

rule

for

the

concept

that

cover

all

+

and

no

-

LA

LakersplayFor

playFor

Pauteammate(x,y)∧

playFor(y,z)

?

playFor(x,z)

teammate

Kobe

Bryant?

Computationally

expensive:

huge

search

space

large,

costly

Horn

clauses?

Must

add

constraints

high

precision

but

low

recall?

Inductive

Logic

Programming:

deterministic

and

potentially

problematicKDD2014TutorialonConstruct25KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014S(KB,

playFor,LAL)iplayForh(pai(KB,LAL))

ipathsPath

Ranking

Algorithm

[Lao

et

al.,

11]???119

LALakersplayFor

PauGasolplayFor

teammate

KobeBryantRandom

walks

on

the

graph

are

used

to

sample

pathsPaths

are

weighted

with

probability

of

reaching

target

from

sourcePaths

are

used

as

ranking

experts

in

a

scoring

function

NY

KnicksplayInLeagueteamInLeagueopponenth(Pa2(KB,LAL))

=

0.2h(Pa1(KB,LAL))

=

0.95KDD2014TutorialonConstruct26KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Link

prediction

with

scoring

functions??A

scoring

function

alone

does

not

grant

a

decisionThresholding:

determine

a

threshold

θ(KB,

playFor,

LAL)

is

True

iff???120S(KB,

playFor,LAL)

Ranking:?

The

most

likely

relation

between

Kobe

Bryant

and

LA

Lakers

is:

rel

argmaxr'relsS(KB,r',LAL)?

The

most

likely

team

for

Kobe

Bryant

is:

obj

argmaxe'entsS(KB,

playFor,e')

As

prior

for

extraction

models

(cf.

Knowledge

Vault)

No

calibration

of

scores

like

probabilitiesKDD2014TutorialonConstruct27知識圖譜架構(gòu)知識圖譜一般架構(gòu):[來源自百度百科]復(fù)旦大學(xué)知識圖譜架構(gòu):早期知識圖譜架構(gòu)知識圖譜架構(gòu)知識圖譜一般架構(gòu):[來源自百度百科]28知識圖譜一般架構(gòu):[來源自百度百科]知識圖譜一般架構(gòu):[來源自百度百科]29知識圖譜梳理專題培訓(xùn)課件30架構(gòu)討論早期知識圖譜架構(gòu)架構(gòu)討論早期知識圖譜架構(gòu)31知識抽取實體概念抽取實體概念映射關(guān)系抽取質(zhì)量評估知識抽取實體概念抽取32KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014A

sampler

of

research

problems?????????????Growth:

knowledge

graphs

are

incomplete!

Link

prediction:

add

relations

Ontology

matching:

connect

graphs

Knowledge

extraction:

extract

new

entities

and

relations

from

web/textValidation:

knowledge

graphs

are

not

always

correct!

Entity

resolution:

merge

duplicate

entities,

split

wrongly

merged

ones

Error

detection:

remove

false

assertionsInterface:

how

to

make

it

easier

to

access

knowledge?

Semantic

parsing:

interpret

the

meaning

of

queries

Question

answering:

compute

answers

using

the

knowledge

graphIntelligence:

can

AI

emerge

from

knowledge

graphs?

Automatic

reasoning

and

planning

Generalization

and

abstraction9KDD2014TutorialonConstruct33關(guān)系抽取定義:常見手段:語義模式匹配[頻繁模式抽取,基于密度聚類,基于語義相似性]層次主題模型[弱監(jiān)督]關(guān)系抽取定義:34KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Methods

and

techniques???Supervised

modelsSemi-supervised

modelsDistant

supervision2.

Entity

resolution?Single

entity

methods?Relational

methods3.

Link

prediction????Rule-based

methodsProbabilistic

modelsFactorization

methodsEmbedding

models80Notinthistutorial:

?Entityclassification?Group/expertdetection?Ontologyalignment?Objectranking 1.Relationextraction:KDD2014TutorialonConstruct35KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014?

Extracting

semantic

relations

between

sets

of

[grounded]

entities?Numerous

variants:?????Undefined

vs

pre-determined

set

of

relationsBinary

vs

n-ary

relations,

facet

discoveryExtracting

temporal

informationSupervision:

{fully,

un,

semi,

distant}-supervisionCues

used:

only

lexical

vs

full

linguistic

features82Relation

Extraction

Kobe

BryantLA

LakersplayForthe

franchise

player

ofonce

again

savedman

of

the

match

forthe

Lakers”his

team”Los

Angeles”“KobeBryant,“Kobe“KobeBryant?KDD2014TutorialonConstruct36KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Supervised

relation

extraction?Sentence-level

labels

of

relation

mentions??"Apple

CEO

Steve

Jobs

said.."

=>

(SteveJobs,

CEO,

Apple)"Steve

Jobs

said

that

Apple

will.."

=>

NIL?Traditional

relation

extraction

datasets???ACE

2004MUC-7Biomedical

datasets

(e.g

BioNLP

clallenges)??Learn

classifiers

from

+/-

examplesTypical

features:

context

words

+

POS,

dependency

path

betweenentities,

named

entity

tags,

token/parse-path/entity

distance83KDD2014TutorialonConstruct37KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Semi-supervised

relation

extraction?Generic

algorithm(遺傳算法)1.2.3.4.5.Start

with

seed

triples

/

golden

seed

patternsExtract

patterns

that

match

seed

triples/patternsTake

the

top-k

extracted

patterns/triplesAdd

to

seed

patterns/triplesGo

to

2?????Many

published

approaches

in

this

category:

Dual

Iterative

Pattern

Relation

Extractor

[Brin,

98]

Snowball

[Agichtein

&

Gravano,

00]

TextRunner

[Banko

et

al.,

07]

almost

unsupervisedDiffer

in

pattern

definition

and

selection86KDD2014TutorialonConstruct38founderOfKDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantly-supervised

relation

extraction88???Existing

knowledge

base

+

unlabeled

text

generate

examples

Locate

pairs

of

related

entities

in

text

Hypothesizes

that

the

relation

is

expressedGoogle

CEO

Larry

Page

announced

that...Steve

Jobs

has

been

Apple

for

a

while...Pixar

lost

its

co-founder

Steve

Jobs...I

went

to

Paris,

France

for

the

summer...GoogleCEO

capitalOfLarryPageFrance

AppleCEO

PixarSteve

JobsfounderOfKDD2014Tutorialon39Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

89KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantsupervision:modeling40KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Sentence-level

features●●●●●Lexical:

words

in

between

and

around

mentions

and

their

parts-of-speech

tags

(conjunctive

form)Syntactic:

dependency

parse

path

between

mentions

along

withside

nodesNamed

Entity

Tags:

for

the

mentionsConjunctions

of

the

above

features

Distant

supervision

is

used

on

to

lots

of

data

sparsity

of

conjunctive

forms

not

an

issue92KDD2014TutorialonConstruct41Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.2.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]At

least

one

sentence

mentioning

them

express

it

[Riedel

et

al.,

10]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

“Obama

flew

back

to

the

US

on

Wednesday.”

(BO,

employedBy,

USA)

95KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distantsupervision:modeling42Distant

supervision:

modeling

hypotheses

Typical

architecture:

1.

Collect

many

pairs

of

entities

co-occurring

in

sentences

from

text

corpus

2.

If

2

entities

participate

in

a

relation,

several

hypotheses:1.2.3.All

sentences

mentioning

them

express

it

[Mintz

et

al.,

09]At

least

one

sentence

mentioning

them

express

it

[Riedel

et

al.,

10]At

least

one

sentence

mentioning

them

express

it

and

2

entities

can

express

multiple

relations

[Hoffmann

et

al.,

11]

[Surdeanu

et

al.,

12]

“Barack

Obama

is

the

44th

and

current

President

of

the

US.”

(BO,

employedBy,

USA)

“Obama

flew

back

tothe

US

justWednesday.”

said.”

employedBy,

USA)

98KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014was

born

in

on

he

always

(BO,

(BO,

bornIn,Distantsupervision:modeling43KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Distant

supervision?Pros???Can

scale

to

the

web,

as

no

supervision

requiredGeneralizes

to

text

from

different

domainsGenerates

a

lot

more

supervision

in

one

iteration?Cons??Needs

high

quality

entity-matchingRelation-expression

hypothesis

can

be

wrongCan

be

compensated

by

the

extraction

model,

redundancy,

language

model?Does

not

generate

negative

examplesPartially

tackled

by

matching

unrelated

entities101KDD2014TutorialonConstruct44KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014104

KobeBryantGasolteammatebornInplayInLeague

BlackMambaEntity

resolution

LA

Lakers

playFor

playFor

Pau35ageKobeB.

BryantVanessaL.BryantmarriedTo

1978Single

entity

resolutionRelational

entity

resolutionKDD2014TutorialonConstruct45DEF:Weconsidertheentityresolution(ER)problem(alsoknownasdeduplication,ormerge–purge),inwhichrecordsdeterminedtorepresentthesamereal-worldentityaresuccessivelylocatedandmergedtheproblemofextracting,matching

andresolvingentitymentionsinstructuredandunstructured

dataMethodsEntityresolution/deduplication ?Multiplementionsofthesameentityiswrongandconfusing.DEF:Entityresolution/dedupl46KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Single-entity

entity

resolution??????????Entity

resolution

without

using

the

relational

context

of

entitiesMany

distances/similarities

for

single-entity

entity

resolution:

Edit

distance

(Levenshtein,

etc.)

Set

similarity

(TF-IDF,

etc.)

Alignment-based

Numeric

distance

between

values

Phonetic

Similarity

Equality

on

a

boolean

predicate

Translation-based

Domain-specific105KDD2014TutorialonConstruct47KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014Relational

entity

resolution

Simple

strategies

?

Enrich

model

with

relational

features

richer

context

for

matching?Relational

features:??Value

of

edge

or

neighboring

attributeSet

similarity

measures?????Overlap/JaccardAverage

similarity

between

set

membersAdamic/Adar:

two

entities

are

more

similar

if

they

share

more

items

that

areoverall

less

frequentSimRank:

two

entities

are

similar

if

they

are

related

to

similar

objectsKatz

score:

two

entities

are

similar

if

they

are

connected

by

shorter

paths114

KobeBryant1978teammatebornInplayForplayInLeague

BlackMamba

LA

LakersplayFor35agePauGasolKDD2014TutorialonConstruct48KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

2014

KobeBryant1978teammatebornInplayForplayInLeague

BlackMamba

LA

LakersplayFor

35agePauGasolRelational

entity

resolution

Advanced

strategies?????Dependency

graph

approaches

[Dong

et

al.,

05]Relational

clustering

[Bhattacharya

&

Getoor,

07]Probabilistic

Relational

Models

[Pasula

et

al.,

03]Markov

Logic

Networks

[Singla

&

Domingos,

06]Probabilistic

Soft

Logic

[Broecheler

&

Getoor,

10]115KDD2014TutorialonConstruct49KDD

2014

Tutorial

on

Constructing

and

Mining

Web-scale

Knowledge

Graphs,

New

York,

August

24,

201

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論