設(shè)計(jì)-語言信號處理_第1頁
設(shè)計(jì)-語言信號處理_第2頁
設(shè)計(jì)-語言信號處理_第3頁
設(shè)計(jì)-語言信號處理_第4頁
設(shè)計(jì)-語言信號處理_第5頁
已閱讀5頁,還剩17頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

——弦律識別方法Linear

Scaling線性紳縮Linear

ScalingLinear

scaling

(LS

for

short),

also

known

asuniform

scaling lobal

scaling,

is

themost

straightforward

frame-based

methodfor

melody

recognition.對于弦律識別,基于幀的音高特性采用LS是最直接有效的識別方法1.

Use

interpolation

to

expand

or

compress

the

inputpitch

vector

linearly.The

scaling

factor

could

range

from

0.5

to

2.0.If

we

take

the

step

of

0.1

for

the

scaling

factor,

itleads

to16

expanded

or

compressed

versions

of

theoriginal

pitch

vector.1.使用內(nèi)插法將使用者輸入的音高向量進(jìn)行線性拉長或壓縮,例如伸縮比例可以是從0.5

到2.0,跳距是0.1,共生出16

個版本。The

Steps

for

linear

scaling2.

CompareCompare

these

16

time-scaled

versions

witheach

song

in

the

database.The

minimum

distance

is

then

defined

as

thedistance

of

the

input

pitch

vector

to

a

song

inthe

database.計(jì)算比對差距將這16個版本和數(shù)據(jù)庫中的每一首歌曲進(jìn)行比對,得到16個距離,其中的最小值,即是輸入向量和此首歌的距離。The

Steps

for

linear

scaling3.

find

the

min

distanceFor

a

given

input

pitch

vector,

compute

thedistances

to

all

songs

in

the

database.

Thesong

with

the

minimum

distance

isthe

mostlikely

song

for

the

input

pitch

vector.3.選取差距最小者對所有數(shù)據(jù)庫歌曲進(jìn)行比對,最短距離者,即是使用者所唱的歌。The

Steps

for

linear

scalingLS

for melody

recognition常用術(shù)語Scaling

factor

伸縮比例Scaling

factor

bounds

伸縮比例范圍Resolution

比對樣版Step:跳距Distance:差距Illustration

for

Linear

Scaling線性伸縮The

following

plot

is

a

typical

example

of

linearscaling,

with

a

scaling-factor

bounds

of

[0.5,

2]

anda

resolution

of

5.

When

the

scaling

factor

is

1.5,

theminimum

distance

is

achieved.consider

the

following

issues1.Method

for

interpolation內(nèi)插法的選用2.Distance

measures

距離的測量3.Distancenormalization距離的正規(guī)化4.Key

transposition音高的校正5.Rest

handling對于休止符的處理1p

i

iL

(|

x

y

|p

)

pRest

handling:

In

order

to

preserve

the

timinginformation,

we

usually

replace

the

rest

withprevious

non-rest

pitch

for

both

input

pitchvector

and

songs

in

the

database.

One

typicalexample

is

"Row,

Row,

Row

Your

Boat"(original

site,

local

copy).consider

the

following

issues實(shí)際LS運(yùn)用技巧(一)Method

for

interpolationSimple

linear

interpolation

should

suffice.Other

advanced

interpolations

may

be

triedonly

ifthey

will

not

make

the

computationprohibitive.使用簡單的線性內(nèi)差足以,其它更高級的內(nèi)差法將會導(dǎo)致運(yùn)算變復(fù)雜。實(shí)際LS運(yùn)用技巧(二)Distance

measures:We

can

use

L1

norm

(the

sum

ofabsolutedifference

of

elements,

also

known

as

taxicabdistance

or

Manhattan

distance)

or

L2

norm(square

root

of

the

sum

of

squared

differenceof

elements,

also

known

as

Euclideandistance)

to

compute

the

distance.通常 使用

L1

norm,也就是計(jì)算每個對應(yīng)元素絕對差值的和,或是使用

L2

norm,又稱為德距離,也就是計(jì)算每個對應(yīng)元素差值的平方和,再開平方,但在實(shí)做上,通常只在比較距離的大小,因此常常省略開平方的動作,以節(jié)省計(jì)算。1p

i

iL

(|

x

y

|p

)

p實(shí)際LS運(yùn)用技巧(三)Distance

normalization:距離歸一化Usually

we

need

to

normalize

the

distanceby

the

length

of

the

vector

to

eliminate

thebiased

introduced

via

expansion

orcompression.會將總距離除以點(diǎn)數(shù),得到正規(guī)化的距離,以消除因伸縮造成點(diǎn)數(shù)不同所帶來的影響。實(shí)際LS運(yùn)用技巧(四)Key

transposition

音高校正To

achieve

invariance

in

key

transposition,

weneed

to

shift

the

input

pitch

vector

to

achieve

aminimum

distance

when

compared

to

the

songs

inthe

database.

For

different

distance

measures,

wehave

different

schemes

for

key

transposition:每一個人唱歌的key不同(通常的key比較高,男生的key比較低),因此在進(jìn)行比對之前,要先進(jìn)行校正。一般而言,校正的目的是要達(dá)到兩個向量之間距離的最小值,因此對于不同的距離計(jì)算方式,就有不同的校則:實(shí)際LS運(yùn)用技巧(四)For

L1

norm,

we

can

shift

the

input

pitch

vectortohave

the

same

median

as

that

of

each

song

in

thedatabase.For

L2

norm,

we

can

shift

the

input

pitch

vectortohave

the

samemean

as

that

of

each

song

in

thedatabase.實(shí)際LS運(yùn)用技巧(五)Rest

handling

停止符In

order

to

preserve

the

timing

information,we

usually

replace

the

rest

with

previous

non-rest

pitch

for

both

input

pitch

vector

andsongs

in

the

database.為了保持音符的特性, 通常會將休止符(包含用戶的輸入和數(shù)據(jù)庫的歌曲)代換成前一個音。LS在弦律識別的特性Characteristics

of

LS

for

melody

recognitioncan

be

summarized

as

follows:If

the

user's

singing

or

humming

is

onstantpace,

LS

usually

gives

satisfactory

performance.如果使用者哼唱的歌聲不是忽快忽慢,那么線性伸縮都可以達(dá)到不錯的辨識效果。LS

is

also

very

efficient

both

in

its

computation

andthe

one-shot

way

to

handle

key

transposition.線性伸縮可以使用「一次到位」的音高校正,所以在計(jì)算上比較簡單。exampleresolution=21;sfBounds=[0.5,

1.5];distanceType=1;%Scaling-factor

bounds%L1-norm[minDist1,

scaledPitch,

allDist]

=linScalingMex(inputPitch,

dbPitch,

sfBounds(1),sfBounds(2),

resolution,

distanceType);axisLimit=[0

370

45

70];subplot(3,1,1);plot(1:length(dbPitch),

dbPitch,

'.-',

1:length(inputPitch),inputPitch,'.-');title('Database

and

input

pitch

vectors');

ylabel('Semitones');legend('Database

pitch',

'Input

pitch',

'location',

'SouthEast');axis(axisLimit);subplot(3,1,2);plot(1:length(dbPitch),dbPitch,

'.-',

1:length(scaledPitch),scaledPitch,

'.-');legend('Database

pitch',

'Scaled

pitch',

'location',

'SouthEast');title('Database

and

scaled

pitch

vectors');

ylabel('Semitones');axis(axisLimit);subplot(3,1,3);ratio=linspace(sfBounds(1),

sfBounds(2),

resolution);plot(ratio,

allDist,

'.-');xlabel('Scalingfactor');

ylabel('Distance');

title('Normalized

distance');Resultsresolution=21;sfBounds=[0.5,

1.5];%

Scaling-factor

boundsdistanceType=1; %

L1-norm[minDist1,

scaledPitch1,allDist1]=linScalingMex(inputPitch,

dbPitch,sfBounds(1),

sfBounds(2),

resolution,

distanceType);distanceType=2; %

L2-norm[minDist1,

scaledPitch2,allDist2]=linScalingMex(inputPitch,

dbPitch,sfBounds(1),

sfBounds(2),

resolution,

distanceType);allDist2=sqrt(allDist2);%

To

reduce

computation,

the

L2-distance

returnedby

linScalingMex

is

actually

the

square

distance,

sowe

need

to

take

the

square

root.axisLimit=[0

370

45

70];subplot(3,1,1);plot(1:length(dbPitch),

dbPitch,

'.-',

1:length(inputPitch),

inputPitch,'.-');title('Database

and

input

pitch

vectors');

ylabel('Semitones');legend('Database

pitch',

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論