計(jì)算機(jī)視覺-計(jì)算理論與算法基礎(chǔ)課件

上傳人：世*** IP屬地：貴州上傳時(shí)間：2022-02-21 格式：PPT 頁數(shù)：72 大?。?.80MB 積分：25 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩67頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、次講座的題目/時(shí)間計(jì)算機(jī)視覺的背景及幾何基礎(chǔ) (2/13,第1周)攝像機(jī)的幾何標(biāo)定 (3/6,第4周)剛體運(yùn)動(dòng)姿態(tài)估計(jì)問題 (3/27,第7周)姿態(tài)估計(jì)問題 (II)(或?qū)?yīng)問題) (4/17,第10周)應(yīng)用 (5/8,第13周)要求v聽5 次講座并積極提問,共同討論(每次有約15-20分鐘的提問及討論時(shí)間)v至少完成3個(gè)實(shí)驗(yàn)中的一個(gè)(程序+報(bào)告)v(上機(jī)地點(diǎn)頭兩周內(nèi)定，到時(shí)候我通知)v完成一篇(與實(shí)驗(yàn)相關(guān)的) “學(xué)術(shù)”論文v最終成績計(jì)算:v本科生: 60%(實(shí)驗(yàn)) + 40%(文章)v研究生: 40%(實(shí)驗(yàn)) + 60%(文章)綱要v什么是什么是CV? 什么是CV? 它是從什么時(shí)候發(fā)展起來的

2、?它有哪些研究內(nèi)容?它與哪些學(xué)科/領(lǐng)域相關(guān)?CV的若干問題及應(yīng)用展望 v幾何基礎(chǔ)概率基礎(chǔ)v一些相關(guān)資源Definitions of CV (1)v“Today, the study of extracting 3-D information from video images and building a 3-D model of the scene, called computer vision or image understanding, is one of the research areas that attract the most attention all over the w

3、orld.” from K. Kanatani, “Statistical Optimization for Geometric Computation: Theory and Practics”, 1996.CV的定義 (2)v“視覺，不僅指對(duì)光信號(hào)的感受，它還包括了對(duì)視覺信息的獲取、傳輸、處理、存儲(chǔ)與理解的全過程信號(hào)處理理論與計(jì)算機(jī)出現(xiàn)以后，人們?cè)噲D用攝像機(jī)獲取環(huán)境圖像并將其轉(zhuǎn)換成數(shù)字信號(hào)，用計(jì)算機(jī)實(shí)現(xiàn)對(duì)視覺信息處理的全過程，這樣，就形成了一門新興的學(xué)科計(jì)算機(jī)視覺計(jì)算機(jī)視覺”“計(jì)算機(jī)視覺的研究目標(biāo)是使計(jì)算機(jī)具有通過二維圖像認(rèn)知三維環(huán)境信息的能力” “計(jì)算機(jī)視覺計(jì)算理論與算法基礎(chǔ)”, 馬頌德

4、, 張正友, 1998.v“計(jì)算機(jī)視覺是當(dāng)前計(jì)算機(jī)科學(xué)研究的一個(gè)非?；钴S的領(lǐng)域，該學(xué)科旨在為計(jì)算機(jī)和機(jī)器人開發(fā)出具有與人類水平相當(dāng)?shù)囊曈X能力。各國學(xué)者對(duì)于計(jì)算機(jī)視覺的研究始于20世紀(jì)60年代初，但相關(guān)基礎(chǔ)研究的大部分重要進(jìn)展則是在80年代以后取得的?！?“http:/ image transformation, image restoration, image enhancement, thresholding, region labelling, and shape characterization.v“Tried to identify and classify objects in im

5、ages by techniques of Pattern Recognition (模式識(shí)別模式識(shí)別), which had been developed for the purpose of recognizing 2-D characters and symbols by feature extraction and statistical decision making by learning”.v“Many pattern recognition researchers believed that the paradigm of pattern recognition would a

6、lso lead to intelligent vision systems that could understand 3-D scenes”.v“However, they soon realized the crucial fact that 3-D objects look very different from viewpoint to viewpoint beyond the capability of 2-D feature-based learning; 3-D meanings of 2-D images cannot be understood unless some a

7、prior knowledge about the scene is given. Thus, Knowledge came to play an essential role”.v“This type of knowledge-based high-level reasoning is called the top-down (自上而下自上而下) (or goal-driven (目標(biāo)驅(qū)動(dòng)) approach.” v“In a sense, this approach corresponds to the psychological view toward human perception(

8、感知) that humans understand the environment by unconsciously matching the vast amount of knowledge accumulated from experience in the process of growth.”v“This view can be compared to what is known as the Gestalt psychology, which regards human perception as integration of the environment and experie

9、nce. ”vThus, the problem of how to represent and organize such knowledge became a major concern, and many symbolic schemes were derived. Establishing such symbolic representations is one of the central themes of artificial intelligence (人工智能人工智能), and machine vision was regarded as problem solving b

10、y artificial intelligence.v“However, the inherent difficulty of this approach was soon realized: the amount of necessary knowledge, most of which has the form of “if then else ”, is limitless, heavily depending on the domain of each application (“office scene”, “outdoor scene”, etc) and constantly c

11、hanging (e.g., today, many telephones are no longer black and do not have dials). However large the amount of knowledge is, exceptions are bound to appear, and computation time blows up exponentially as the amount of knowledge increases.”vMany combinatorial techniques were proposed so as to find pla

12、usible interpretation efficiently without doing exhaustive search. Such techniques include various types of heuristic (啟發(fā)啟發(fā)式的式的) search as well as special techniques such as constraint propagation (約束繁殖約束繁殖) and probabilistic relaxation (概率松弛概率松弛).v“Realizing that such computational problems are ine

13、vitable as long as knowledge is directly matched with features extracted from raw images, researchers began to pay attention to “physical/optical laws” governing 3-D scenes. In analyzing 2-D images, such laws can provide clues to the 3-D shapes and positions of objects. ”v“For example, the surface g

14、radients of objects can be estimated by analyzing shading intensities (shape from shading). The orientation of a surface in the scene can also be estimated by analyzing the perspective distortion of a texture on it (shape from texture). If objects are moving in the scene (or the camera is moving rel

15、ative to the objects), the 3-D shapes of the objects and their 3-D motions (or the camera motion) can be computed (shape from motion or structure from motion).” v“Although such analyses require appropriate assumptions about surface reflectance, illumination, perspective distortion, and rigid motion,

16、 they do not depend on specific application domains; they are called constraints in contrast to knowledge for the top-down approach. vThis approach is in line with the psychological view toward human vision that human perception occurs automatically when visual signals trigger computation in the bra

17、in and that this computational functionality is innate, acquired in the process of evolution. ”vThis view was asserted by J. J. Gibson, who had a great influence on not only psychologists but also machine vision researchers. vThus, a new paradigm (范例) was established. First, primitive features are e

18、xtracted from raw images by edge detection and image segmentation, resulting in primal sketches; next, approximate shapes and surface orientations are estimated by applying available constraints (shading, texture, motion, stereo, etc.), resulting in 2.5-D sketches; vthen, appropriate 3-D models (e.g

19、., generalized cylinders) are fitted to such data, resulting in a numerical and symbolic representation of the scene; finally, high-level inference is made from such representations. This is called the bottom-up (自下向上自下向上) (or data-driven (數(shù)據(jù)驅(qū)動(dòng)數(shù)據(jù)驅(qū)動(dòng)) approach, which is also known as the Marr paradigm

20、 after David Marr, who strongly endorsed this approach. Marr的計(jì)算視覺理論框架vMarr從信息處理系統(tǒng)的角度出發(fā)，認(rèn)為視覺系統(tǒng)的研究應(yīng)分為三個(gè)層次，即計(jì)算理論層次、表達(dá)(representation)與算法層次、硬件實(shí)現(xiàn)層次v計(jì)算理論層次要回答系統(tǒng)各部分的計(jì)算目的與計(jì)算策略，亦即各部分的輸入輸出是什么，之間的關(guān)系是什么變換或什么約束v表達(dá)與算法層次應(yīng)給出各部分的輸入輸出和內(nèi)部的信息表達(dá)，以及實(shí)現(xiàn)計(jì)算理論所規(guī)定的目標(biāo)的算法.v硬件實(shí)現(xiàn)層次要回答“如何用硬件實(shí)現(xiàn)以上算法”vA major drawback of this approach

21、 is its susceptibility to noise. Computation solely based on physical/optical constraints is likely to produce meaningless interpretations in the presence of noise. This is because 3-D reconstruction from 2-D data is a typical inverse problem (逆問題), for which solutions are known to be generally unst

22、able with respect to noise. v“In order to cope with this inherent ill-posedness, many optimization techniques were devised so as to force the solution to have required properties. Such techniques are generally called regularization. Other types of optimization include a stochastic relaxation techniq

23、ue called simulated annealing (模擬退火), which was constructed by analogy with statistical mechanics, and the use of neural networks, which gave rise to a new view toward human cognition called connectionism. ”v“Today, many attempts are being made to enhance the reliability of image data. One approach

24、is to actively control the motion of the camera so that the resulting 3-D interpretation becomes stable (active vision). Another approach is using multiple sensors (stereo, range sensing, etc.) and fusing the data (sensor fusion). ”v“In order to fuse data, the reliability of individual data must be

25、evaluated in quantitative terms so that reliable data contribute more than unreliable data.” v“Some researchers are attempting to use only minimum information that is enough to achieve a specific goal such as object avoidance (qualitative vision, purposive vision, etc.). ”v- for detailed information

26、, read “intro_KKanatani.doc”相關(guān)領(lǐng)域v數(shù)學(xué)，物理學(xué)v腦科學(xué)(或神經(jīng)生理學(xué))v心理學(xué)，認(rèn)知科學(xué), AI, “計(jì)算機(jī)視覺發(fā)展得益于神經(jīng)生理學(xué)、心理學(xué)與認(rèn)知科學(xué)對(duì)動(dòng)物視覺系統(tǒng)的研究，但計(jì)算機(jī)視覺已發(fā)展起一套獨(dú)立的計(jì)算理論與算法獨(dú)立的計(jì)算理論與算法，它并不刻意去仿真生物視覺系統(tǒng)”相關(guān)學(xué)科與相關(guān)課程的聯(lián)系相關(guān)學(xué)科與相關(guān)課程的聯(lián)系數(shù)字圖象處理計(jì)算機(jī)視覺模式識(shí)別機(jī)器視覺計(jì)算機(jī)圖形學(xué)線性代數(shù)集合論高級(jí)語言程序設(shè)計(jì)數(shù)據(jù)結(jié)構(gòu)先后順序重疊量反應(yīng)相關(guān)程度基礎(chǔ)知識(shí)計(jì)算機(jī)視覺專題(如圖象與視覺計(jì)算)高等代數(shù)最優(yōu)化方法。信號(hào)與系統(tǒng)計(jì)算幾何Overview (1)v計(jì)算機(jī)視覺的幾何學(xué)基礎(chǔ)攝像機(jī)模型

27、v單攝像機(jī)(pinhole model/perspective transformation)v雙攝像機(jī) (epipolar geometry: fundamental/essential matrix)v三攝像機(jī)及更多(multi-view geometry)運(yùn)動(dòng)估計(jì)v對(duì)應(yīng)點(diǎn)問題（correspondence problem）v光流計(jì)算方法v剛體運(yùn)動(dòng)參數(shù)估計(jì)（minimal projective reconstruction）2-view, 7 points in correspondence; (Faugeras)3-view, 6 points in correspondence; (Q

28、uan Long)3-view, 8 points with one missing in one of the three view. (Quan Long)幾何重構(gòu)（Geometry reconstruction）v立體視覺(stereo vision)vShape from X (shading/motion/texture/contour/focus/de-focus/.)Overview (2)v計(jì)算機(jī)視覺的物理學(xué)基礎(chǔ)攝像機(jī)及其成像過程v視點(diǎn)、光源、空間中光線、表面處的光線.v明暗 (shading)、陰影 (shadow)光學(xué)/色彩 (light/color)v輻射學(xué)(radiom

29、etry)，輻照率, , 物體表面特性v漫反射表面（各向同性）Lambertian surfacevBDRF (bi-directional reflectance distribution function)Overview (3)v計(jì)算機(jī)視覺的圖像模型基礎(chǔ)攝像機(jī)模型及其校準(zhǔn)v內(nèi)參數(shù)、外參數(shù)圖像特征v邊緣、角點(diǎn)、輪廓、紋理、形狀圖像序列特征 (運(yùn)動(dòng))v對(duì)應(yīng)點(diǎn)、光流Overview (4)計(jì)算機(jī)視覺的信號(hào)處理層次v低層視覺處理單圖像：濾波/邊緣檢測/紋理多圖像：幾何/立體/從運(yùn)動(dòng)恢復(fù)仿射或透視結(jié)構(gòu) (affine/perspective structure from motion)v中層視覺處

30、理聚類分割/擬合線條、曲線、輪廓 clustering for segmentation, fitting line基于概率方法的聚類分割/擬合跟蹤 trackingv高層視覺處理匹配模式分類/關(guān)聯(lián)模型識(shí)別 pattern classification/aspect graph recognitionv應(yīng)用距離數(shù)據(jù)（range data）/圖像數(shù)據(jù)檢索/基于圖像的繪制Overview (5)計(jì)算機(jī)視覺的數(shù)學(xué)基礎(chǔ)v射影仿射幾何、微分幾何v概率統(tǒng)計(jì)與隨機(jī)過程v數(shù)值計(jì)算與優(yōu)化方法v機(jī)器學(xué)習(xí)計(jì)算機(jī)視覺的基本的分析工具和數(shù)學(xué)模型vSignal processing approach: FFT, filt

31、ering, wavelets, vSubspace approach: PCA, LDA, ICA, vBayesian inference approach: EM, Condensation/sequential importance sampling (SIS) , Markov chain Monte Carlo (MCMC) , .vMachine learning approach: SVM/Kernel machine, Boosting/Adaboost, k-NN/Regression, vHMM, BN/DBN (Dynamic Bayesian Network), vG

32、ibbs, MRF, vOverview (6)計(jì)算機(jī)視覺問題的特點(diǎn)v高維數(shù)據(jù)的本質(zhì)維數(shù)很低，使得模型化成為可能。High dimensional image/video data lie in a very low dimensional manifold.v解的不唯一性缺少約束的逆問題v優(yōu)化問題CV的若干問題及應(yīng)用展望v基本視覺系統(tǒng)如下：特征檢測Shape from X識(shí)別圖像低層特征位置與形狀物體描述涉及模塊與系統(tǒng)的研究存在的問題與出現(xiàn)的一些新思路，如“視覺信息處理系統(tǒng)的任務(wù)”， “關(guān)于模塊化問題” ， “局部特征與全局特征” ， “物體建?！?，等等三維計(jì)算機(jī)視覺將會(huì)有極廣泛

33、的應(yīng)用前景, 如: 計(jì)算機(jī)人機(jī)交互；多媒體技術(shù)，數(shù)據(jù)庫與圖像通信；生產(chǎn)自動(dòng)化；醫(yī)學(xué)；自動(dòng)導(dǎo)航；三維場景建模與可視化綱要v什么是CV? 什么是CV? 它是從什么時(shí)候發(fā)展起來的?它有哪些研究內(nèi)容?它與哪些學(xué)科/領(lǐng)域相關(guān)?CV的若干問題及應(yīng)用展望 v幾何基礎(chǔ)概率基礎(chǔ)幾何基礎(chǔ)概率基礎(chǔ)v一些相關(guān)資源射影幾何知識(shí)簡介v歐氏幾何：旋轉(zhuǎn)和平移都是歐氏變換研究在歐氏變換下保持不變的性質(zhì)（歐氏性質(zhì)）的幾何是歐氏幾何如平行性，長度，角度等都是歐氏性質(zhì)v射影幾何：照相機(jī)的成像過程是一個(gè)射影（透視或中心射影）的過程它不保持歐氏性質(zhì)，如平行線不再平行研究射影空間射影空間中在射影變換下保持不變的性質(zhì)（射影性質(zhì)）的幾何學(xué)是

34、射影幾何無窮遠(yuǎn)元素v平行線交于一個(gè)無窮遠(yuǎn)點(diǎn)；v平行平面交于一條無窮遠(yuǎn)直線；v在一條直線上只有唯一一個(gè)無窮遠(yuǎn)點(diǎn)；v所有的一組平行線共有一個(gè)無窮遠(yuǎn)點(diǎn)v在一個(gè)平面上，所有的無窮遠(yuǎn)點(diǎn)組成一條直線，稱為這個(gè)平面的無窮遠(yuǎn)直線維空間中所有的無窮遠(yuǎn)點(diǎn)組成一個(gè)平面，稱為這個(gè)空間的無窮遠(yuǎn)平面射影空間v對(duì)n維歐氏空間加入無窮遠(yuǎn)元素，并對(duì)有限元素和無窮遠(yuǎn)元素不加區(qū)分不加區(qū)分，則它們共同構(gòu)成了n維射影空間射影空間.v1維射影空間是一條射影直線，它由歐氏直線和它的無窮遠(yuǎn)點(diǎn)組成；v2維射影空間是一個(gè)射影平面，它由歐氏平面和它的無窮遠(yuǎn)直線組成；v3維射影空間是由3維歐氏空間加上無窮遠(yuǎn)平面組成齊次坐標(biāo)v在歐氏空間中建立坐標(biāo)系

35、以后，點(diǎn)與坐標(biāo)有了一一對(duì)應(yīng)，但當(dāng)引入無窮遠(yuǎn)點(diǎn)以后，無窮遠(yuǎn)點(diǎn)沒有坐標(biāo)，為了刻劃無窮遠(yuǎn)點(diǎn)的坐標(biāo)，可以引入齊次坐標(biāo)v在n維歐氏空間中，建立直角坐標(biāo)以后，每個(gè)點(diǎn)的坐標(biāo)為(m1, , mn)，對(duì)任意n+1個(gè)數(shù)x1, , xn, x0，如果滿足x00, xi/x0 = mi, (i = 1n)則稱(x1, , xn, x0)為該點(diǎn)的齊次坐標(biāo)齊次坐標(biāo)而(m1, , mn)被稱為非齊次坐標(biāo)齊次坐標(biāo)v不全為0的數(shù)x1, , xn組成的坐標(biāo) (x1, , xn, 0)被稱為無窮遠(yuǎn)點(diǎn)的齊次坐標(biāo)v例設(shè)在歐氏直線上的普通點(diǎn)的坐標(biāo)為x，則適合x1/ x0 = x的任意兩個(gè)數(shù)組成的坐標(biāo)(x1, x0)為該點(diǎn)的齊次坐標(biāo)，而

36、x為該點(diǎn)的非齊次坐標(biāo)對(duì)任意x1 0，則(x1, 0)是無窮遠(yuǎn)點(diǎn)的齊次坐標(biāo)射影參數(shù)交比射影變換射影平面中的對(duì)偶v“點(diǎn)”與“直線”叫做射影平面上的對(duì)偶元素v“過一點(diǎn)作一直線”與“在直線上取一點(diǎn)”叫做對(duì)偶作圖v在射影平面設(shè)有點(diǎn)，直線及其相互結(jié)合和順序關(guān)系所組成的一個(gè)命題，將此命題中的各元素改為它的對(duì)偶元，各作圖改為它的對(duì)偶作圖，其結(jié)果形成另一個(gè)命題，這兩個(gè)命題稱為平面對(duì)偶命題v對(duì)偶原則：在射影平面中，若一個(gè)命題成立，則其對(duì)偶命題也成立調(diào)和關(guān)系v若點(diǎn)對(duì)(P1, P2)和(P3, P4)的交比是-1，即 (P1, P2;P3, P4) = -1,則稱(P1, P2)與(P3, P4) 是調(diào)和調(diào)和的v點(diǎn)對(duì)(P1, P2)與(P3, P4) 是調(diào)和的當(dāng)且僅當(dāng)(1+2)(3+4) = 2(12 +34)其中i分別是Pi (i = 1, , 4)的射影參數(shù)完全四點(diǎn)(線)形中的調(diào)和關(guān)系二次曲線絕對(duì)二次曲線(Absolute Conic)極點(diǎn)與極線v對(duì)于一個(gè)二次曲線C和某個(gè)點(diǎn)A(向量)，由L=CA確定的直線(線坐標(biāo))稱為點(diǎn)A關(guān)于二次

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

計(jì)算機(jī)視覺-計(jì)算理論與算法基礎(chǔ)課件

文檔簡介

溫馨提示

最新文檔

評(píng)論

計(jì)算機(jī)視覺-計(jì)算理論與算法基礎(chǔ)課件

文檔簡介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔