錄播問(wèn)答系統(tǒng)詞向量句子倒排表項(xiàng)目作業(yè)解釋-課程內(nèi)容lesson_第1頁(yè)
錄播問(wèn)答系統(tǒng)詞向量句子倒排表項(xiàng)目作業(yè)解釋-課程內(nèi)容lesson_第2頁(yè)
錄播問(wèn)答系統(tǒng)詞向量句子倒排表項(xiàng)目作業(yè)解釋-課程內(nèi)容lesson_第3頁(yè)
錄播問(wèn)答系統(tǒng)詞向量句子倒排表項(xiàng)目作業(yè)解釋-課程內(nèi)容lesson_第4頁(yè)
錄播問(wèn)答系統(tǒng)詞向量句子倒排表項(xiàng)目作業(yè)解釋-課程內(nèi)容lesson_第5頁(yè)
已閱讀5頁(yè),還剩40頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

開(kāi)始:文本表示每個(gè)單詞的表示::爬山:跑步:昨天:Word

Representation詞典:

[

,去,爬山,今天,

,昨天,跑步]每個(gè)句子的表示今天去爬山:昨天跑步:又去爬山又去跑步:Sentence

Representation

(boolean)詞典:

[

,又,去,爬山,今天,

,昨天,跑步]每個(gè)句子的表示今天去爬山:昨天跑步:又去爬山又去跑步:Sentence

Representation

(count)詞典:

[

,又,去,爬山,今天,

,昨天,跑步]結(jié)束:文本表示開(kāi)始:文本相似度S1:

“S2:

"S3:今天去爬山”=(1,0,1,1,0,0,0,0)昨天跑步"=(0,0,0,0,0,1,1,1)又

爬山

跑步 "

=

(0,2,2,1,0,1,0,1)“Sentence

Similarity計(jì)算距離(歐式距離):??=|?????????|Sentence

Similarity計(jì)算相似度(余弦相似度):??

=

????

/????/(

????

? ????

)S1:

“S2:

"S3:今天去爬山”=(1,0,1,1,0,0,0,0)昨天跑步"=(0,0,0,0,0,1,1,1)又

爬山

跑步 "

=

(0,2,2,1,0,1,0,1)“Sentence

Similarity句子1:He

is

going

from

Beijing

to

Shanghai句子2:He

denied

my

request,but

he

actually

lied.句子3:Mike

lost

the

phone,and

phone

was

in

the

car句子1:(0,0,1,0,0,0,1,1,1,0,1,0,0,0,0,0,0,1,0,1,0)句子2:(1,0,0,1,0,1,0,0,2,0,0,1,0,0,1,0,1,0,0,0,0)句子3:(0,1,0,0,1,0,0,0,0,1,0,0,1,1,0,2,0,0,2,0,1)deniedheSentence

Similarity句子1:He

is

going

from

Beijing

to

Shanghai句子2:He

denied

my

request,but

he

actually

lied.句子3:Mike

lost

the

phone,and

phone

was

in

the

car句子1:(0,0,1,0,0,0,1,1,1,0,1,0,0,0,0,0,0,1,0,1,0)句子2:(1,0,0,1,0,1,0,0,2,0,0,1,0,0,1,0,1,0,0,0,0)句子3:(0,1,0,0,1,0,0,0,0,1,0,0,1,1,0,2,0,0,2,0,1)deniedhe并不是出現(xiàn)的越多就越重要!并不是出現(xiàn)的越少就越不重要!結(jié)束:文本相似度開(kāi)始:tf-idf

文本表示Sentence

Similarity句子1:He

is

going

from

Beijing

to

Shanghai句子2:He

denied

my

request,but

he

actually

lied.句子3:Mike

lost

the

phone,and

phone

was

in

the

car句子1:(0,0,1,0,0,0,1,1,1,0,1,0,0,0,0,0,0,1,0,1,0)句子2:(1,0,0,1,0,1,0,0,2,0,0,1,0,0,1,0,1,0,0,0,0)句子3:(0,1,0,0,1,0,0,0,0,1,0,0,1,1,0,2,0,0,2,0,1)deniedhe并不是出現(xiàn)的越多就越重要!并不是出現(xiàn)的越少就越不重要!Tf-idf

Representation??????????(??)=

????

??,

??

?

??????(??)文檔??中??的詞頻??????????(??)N:語(yǔ)料庫(kù)中的文檔總數(shù)N(w):詞語(yǔ)w出現(xiàn)在多少個(gè)文檔???????????(??)=

????

??,

??

?

??????(??)今天上NLP課程

今天的課程有意思數(shù)據(jù)課程也有意思Measure

Similarity

Between

Words下面哪些單詞之間語(yǔ)義相似度更高?,爬山,運(yùn)動(dòng),昨天結(jié)束:tf-idf文本表示開(kāi)始:詞向量介紹Measure

Similarity

Between

Words下面哪些單詞之間語(yǔ)義相似度更高?,爬山,運(yùn)動(dòng),昨天Measure

Similarity

Between

Words利用One-hot表示法表達(dá)單詞之間相似度?每個(gè)單詞的表示::

[1,

0,

0,

0,

0,

0,

0]爬山:

[0,

0, 1,

0,

0,

0,

0]運(yùn)動(dòng):

[0,

0,

0,

0,

0,

0,

1]昨天:

[0,

0,

0,

0,

0,

1,

0]Another

Issue:

Sparsity今天打算去爬山昨天做什么了明天打算去上課From

One-hot

Representation

toDistributed

Representation:

[1,

0,

0,

0,

0,

0,

0]爬山:

[0,

0, 1,

0,

0,

0,

0]運(yùn)動(dòng):

[0,

0,

0,

0,

0,

0,

1]昨天:

[0,

0,

0,

0,

0,

1,

0]:

[0.1,

0.2,

0.4,

0.2]爬山:

[0.2,

0.3,

0.7,

0.1]運(yùn)動(dòng):

[0.2,

0.3,

0.6,

0.2]昨天:

[0.5,

0.9, 0.1,

0.3]One-Hot

RepresentationDistributed

RepresentationMeasure

Similarity

Between

Words:

[0.1,

0.2,

0.4,

0.2]爬山:

[0.2,

0.3,

0.7,

0.1]運(yùn)動(dòng):

[0.2,

0.3,

0.6,

0.2]昨天:

[0.5,

0.9, 0.1,

0.3]Distributed

RepresentationComparing

the

CapacitiesQ:100維的One-Hot

表示法最多可以表達(dá)多少個(gè)不同的單詞?Q:100維的分布式表示法最多可以表達(dá)多少個(gè)不同的單詞?Comparing

the

CapacitiesQ:100維的One-Hot

表示法最多可以表達(dá)多少個(gè)不同的單詞?Q:100維的分布式表示法最多可以表達(dá)多少個(gè)不同的單詞?QuestionsQ:怎么學(xué)習(xí)每一個(gè)單詞的分布式表示(詞向量)?結(jié)束:詞向量介紹開(kāi)始:學(xué)習(xí)詞向量Learn

Word

Embeddings今天去爬山你么昨天運(yùn)動(dòng)去爬山:

[0.1,

0.2,

0.4,

0.2]爬山:

[0.2,

0.3,

0.7,

0.1]運(yùn)動(dòng):

[0.2,

0.3,

0.6,

0.2]昨天:

[0.5,

0.9, 0.1,

0.3]Distributed

RepresentationEssence

of

Word

Embedding:

[0.1,

0.2,

0.4,

0.2]爬山:

[0.2,

0.3,

0.7,

0.1]運(yùn)動(dòng):

[0.2,

0.3,

0.6,

0.2]昨天:

[0.5,

0.9, 0.1,

0.3]Distributed

RepresentationFrom

Word

Embedding

to

SentenceEmbedding結(jié)束:學(xué)習(xí)詞向量開(kāi)始:基于檢索的問(wèn)答系統(tǒng)缺點(diǎn)QuestionHow

do

you

like

NLPCamp?<Question1,answer1><question2,answer2>……<question100,answer100>a返回相似度最高的相似度匹配Recap:

Retrieval-based

QA

System思路:“層次過(guò)濾思想”How

to

Reduce

Time

Complexity?Recap:

Retrieval-based

QA

SystemQuestion“How

do

you

like

NLPCamp?”<question1,answer1><question2,answer2>……<question100,answer100>返回相似度最高的過(guò)濾相似度匹配<question2,answer2><question17,answer17>……<question98,answer9

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論