![Algorithms for Nearest Neighbor Search-大學(xué)課件-在線_第1頁](http://file4.renrendoc.com/view11/M01/30/21/wKhkGWVnW92AB7giAABlJ43hRyE595.jpg)
![Algorithms for Nearest Neighbor Search-大學(xué)課件-在線_第2頁](http://file4.renrendoc.com/view11/M01/30/21/wKhkGWVnW92AB7giAABlJ43hRyE5952.jpg)
![Algorithms for Nearest Neighbor Search-大學(xué)課件-在線_第3頁](http://file4.renrendoc.com/view11/M01/30/21/wKhkGWVnW92AB7giAABlJ43hRyE5953.jpg)
![Algorithms for Nearest Neighbor Search-大學(xué)課件-在線_第4頁](http://file4.renrendoc.com/view11/M01/30/21/wKhkGWVnW92AB7giAABlJ43hRyE5954.jpg)
![Algorithms for Nearest Neighbor Search-大學(xué)課件-在線_第5頁](http://file4.renrendoc.com/view11/M01/30/21/wKhkGWVnW92AB7giAABlJ43hRyE5955.jpg)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
Algorithms
for
Nearest
NeighborSearchPiotr
IndykMITNearest
Neighbor
SearchGiven:
a
set
P
of
n
points
in
Rd
Goal:
a
data
structure,
which
given
a
quepoint
q,
finds
the
nearest
neighbor
p
ofin
PpqOutline
of
this
talkVariantsMotivationMain
memory
algorithms:quadtreeskd-treesLocality
Sensitive
HashingSecondary
storage
algorithms:R-tree
(and
its
variants)VA-fileVariants
of
nearest
neighbor
Near
neighbor
(range
search):
find
one/alpoints
in
P
within
distance
r
from
q
Spatial
join:
given
two
sets
P,Q,
find
allpairs
p
in
P,
q
in
Q,
such
that
p
is
withindistance
r
from
q
Approximate
near
neighbor:
find
one/allpoints
p’
in
P,
whose
distance
to
q
is
atmost
(1+e)
times
the
distance
from
q
to
itsnearest
neighborMotivationDepends
on
the
value
of
d:low
d:
graphics,
vision,
GIS,
etchigh
d:similarity
search
in
databases
(text,
imagesfinding
pairs
of
similar
objects
(e.g.,
copyrviolation
detection)useful
subroutine
for
clusteringAlgorithmsMain
memory
(Computational
Geometry)linear
scantree-based:quadtreekd-treehashing-based:
Locality-Sensitive
HashingSecondary
storage
(Databases)R-tree
(and
numerous
variants)Vector
Approximation
File
(VA-file)QuadtreeSimplest
spatial
structure
on
Earth
!Quadtree
ctd.Split
the
space
into
2d
equal
subsquaresRepeat
until
done:only
one
pixel
leftonly
one
point
leftonly
a
few
points
leftVariants:split
only
one
dimension
at
a
timek-d-trees
(in
a
moment)Range
searchNear
neighbor
(range
search):put
the
root
on
the
stackrepeatpop
the
next
node
T
from
the
stackfor
each
child
C
of
T:if
C
is
a
leaf,
examine
point(s)
in
Cif
C
intersects
with
the
ball
of
radius
r
around
q,
add
Cthe
stackNear
neighbor
ctdNearest
neighborStart
range
search
with
r
=Whenever
a
point
is
found,
update
r
Only
investigate
nodes
with
respect
tocurrent
rQuadtree
ctd.Simple
data
structureVersatile,
easy
to
implementSo
why
doesn’t
this
talk
end
here
?Empty
spaces:
if
the
points
form
sparse
cloudit
takes
a
while
to
reach
themSpace
exponential
in
dimensionTime
exponential
in
dimension,
e.g.,
points
othe
hypercubeSpace
issues:
exampleK-d-trees
[Bentley’75]Main
ideas:only
one-dimensional
splitsinstead
of
splitting
in
the
middle,
choose
thsplit
“carefully”
(many
variations)near(est)
neighbor
queries:
as
for
quadtreesAdvantages:no
(or
less)
empty
spacesonly
linear
spaceExponential
query
time
still
possibleExponential
query
timeWhat
does
it
mean
exactly
?Unless
we
do
something
really
stupid,
query
time
ismost
dnTherefore,
the
actual
query
time
isMin[
dn,
exponential(d)
]
This
is
still
quite
bad
though,
when
the
dimensiois
around
20-30
Unfortunately,
it
seems
inevitable
(both
in
theoand
practice)Approximate
nearest
neighbor
Can
do
it
using
(augmented)
k-d
trees,
byinterrupting
search
earlier
[Arya
et
al’Still
exponential
time
(in
the
worst
caseTry
a
different
approach:for
exact
queries,
we
can
use
binary
searchtrees
or
hashingcan
we
adapt
hashing
to
nearest
neighborsearch
?Locality-Sensitive
Hashing[Indyk-Motwani’98]
Hash
functions
are
locality-sensitive,
ia
random
hash
random
function
h,
for
anypair
of
points
p,q
we
have:Pr[h(p)=h(q)]
is
“high”
if
p
is
“close”
tqPr[h(p)=h(q)]
is
“l(fā)ow”
if
p
is”far”
fromqDo
such
functions
exist
?Consider
the
hypercube,
i.e.,pointsfrom{0,1}dHamming
distance
D(p,q)=
#
positions
onwhich
p
and
q
differDefine
hash
function
h
by
choosing
a
set
Iof
k
random
coordinates,
and
settingh(p)
=projection
of
p
onIExampleTake–
d=10,
p=0101110010–
k=2,
I={2,5}Then
h(p)=11h’s
are
locality-sensitivePr[h(p)=h(q)]=(1-D(p,q)/d)kWe
can
vary
the
probability
by
changing
kk=1k=2distancedistancePrPrHow
can
we
use
LSH
?Choose
several
h1..hlInitialize
a
hash
array
for
each
hiStore
each
point
p
in
the
bucket
hi(p)
of
ti-th
hash
array,
i=1...lIn
order
to
answer
query
qfor
each
i=1..l,
retrieve
points
in
a
bucket
hreturn
the
closest
point
foundWhat
does
this
algorithm
do
?
By
proper
choice
of
parameters
k
and
l,
we
canmake,
for
any
p,
the
probability
thathi(p)=hi(q)
for
some
ilook
like
this:Can
control:Position
of
the
slopeHow
steep
it
isdistanceThe
LSH
algorithm
Therefore,
we
can
solve
(approximately)
the
nearneighbor
problem
with
given
parameter
rWorst-case
analysis
guarantees
dn1/(1+e)
query
ti
Practical
evaluation
indicates
much
better
beha[GIM’99,HGI’00,Buh’00,BT’00]Drawbacks:
works
best
for
Hamming
distance
(although
can
be
generalizeto
Euclidean
space)requires
radius
r
to
be
fixed
in
advanceSecondary
storage
Seek
time
same
as
time
needed
to
transferhundreds
of
KBsGrouping
the
data
is
crucialDifferent
approach
required:in
main
memory,
any
reduction
in
the
numberof
inspected
points
was
goodon
disk,
this
is
not
the
case
!Disk-based
algorithmsR-tree
[Guttman’84]departing
point
for
many
variationsover
600
citations
!
(according
to
CiteSeer)“optimistic”
approach:
try
to
answer
queries
inlogarithmic
timeVector
Approximation
File
[WSB’98]“pessimistic”
approach:
if
we
need
to
scan
the
whdata
set,
we
better
do
it
fastLSH
works
also
on
diskR-tree
“Bottom-up”
approach
(k-d-tree
was“top-down”)
:Start
with
a
set
of
points/rectanglesPartition
the
set
into
groups
of
small
cardinFor
each
group,
find
minimum
rectanglecontaining
objects
from
this
groupRepeatR-tree
ctd.R-tree
ctd.Advantages:Supports
near(est)
neighbor
search
(similarbefore)Works
for
points
and
rectanglesAvoids
empty
spacesMany
variants:
X-tree,
SS-tree,
SR-tree
etcWorks
well
for
low
dimensionsNot
so
great
for
high
dimensionsVA-file
[Weber,
Schek,Blott’98]Approach:In
high-dimensional
spaces,
all
tree-basedindexing
structures
examine
large
fraction
ofleavesIf
we
need
to
visit
so
many
nodes
anyway,
it
isbetter
to
scan
the
whole
data
set
and
avoidperforming
seeks
altogether1
seek
=
transfer
of
few
hundred
KBVA-file
ctd.
Natural
question:
how
to
speed-up
linearscan
?Answer:
use
approximationUse
only
i
bits
per
dimension
(and
speed-up
thscan
by
a
factor
of
32/i)Identify
all
points
which
could
be
returned
aan
answerVerify
the
points
using
original
data
setTime
to
sum
up“Curse
of
dimensionality”
is
indeed
a
curse
In
main
memory,
we
can
perform
sublinear-timesearch
using
trees
or
hashing
In
secondary
storage,
linear
scan
is
p
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 懷化2024年湖南中坡國家森林公園管理處選調(diào)2人筆試歷年參考題庫附帶答案詳解
- 2025年模切燙印兩用機(jī)項(xiàng)目可行性研究報(bào)告
- 2025至2031年中國微調(diào)雙燈行業(yè)投資前景及策略咨詢研究報(bào)告
- 2025至2031年中國安全控制芯片行業(yè)投資前景及策略咨詢研究報(bào)告
- 2025年塑料四輥壓延機(jī)項(xiàng)目可行性研究報(bào)告
- 2025至2031年中國乳膠漆調(diào)色機(jī)行業(yè)投資前景及策略咨詢研究報(bào)告
- 2025年人體工學(xué)型電腦臺(tái)項(xiàng)目可行性研究報(bào)告
- 2025至2030年鼓式制動(dòng)器項(xiàng)目投資價(jià)值分析報(bào)告
- 2025至2030年鋼板清洗劑項(xiàng)目投資價(jià)值分析報(bào)告
- 2025至2030年中國金屬自攻螺釘數(shù)據(jù)監(jiān)測研究報(bào)告
- 2023年上海市秋考語文真題試卷含答案(整理版)
- 2023年心理咨詢師之心理咨詢師基礎(chǔ)知識(shí)考試題庫附完整答案【有一套】
- 路緣石安裝一級(jí)安全交底
- 一級(jí)建造師繼續(xù)教育最全題庫及答案(新)
- LS/T 1226-2022糧庫智能通風(fēng)控制系統(tǒng)
- 直線加速器專項(xiàng)施工方案
- 聯(lián)苯二氯芐生產(chǎn)工藝及產(chǎn)排污分析
- 儲(chǔ)能設(shè)備項(xiàng)目采購供應(yīng)質(zhì)量管理方案
- 美國房地產(chǎn)市場特征、框架與周期演變
- 光伏發(fā)電工程施工組織設(shè)計(jì)施工工程光伏發(fā)電工程光伏發(fā)電施工組織設(shè)計(jì)
- 民政局離婚協(xié)議書模板(4篇)
評(píng)論
0/150
提交評(píng)論