[精品論文]A New Local PCA SOM Algorithm.doc_第1頁
[精品論文]A New Local PCA SOM Algorithm.doc_第2頁
[精品論文]A New Local PCA SOM Algorithm.doc_第3頁
[精品論文]A New Local PCA SOM Algorithm.doc_第4頁
[精品論文]A New Local PCA SOM Algorithm.doc_第5頁
已閱讀5頁,還剩15頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

精品論文a new local pca som algorithmdong huang zhang yi and xiaorong pucomputational intelligence laboratory, school of computer science andengineering, university of electronic science and technology of china, chengdu610054, p. r. china.e-mail: donnyhuang, zhangyi, .abstractthis paper proposes a local pca-som algorithm. the new competition measure is computational efficient, and implicitly incorporates the mahalanobis distance and the reconstruction error. the matrix inversion or pca decomposition for each data input is not needed as compared to the previous models. moreover, the local data distribution is completely stored in the covariance matrix instead of the pre-defined numbers of the principal components. thus, no priori information of the optimal principal subspace is required. experiments on both the synthesis data and a pattern learning task are carried out to show the performance of the proposed method.key words: neural networks; unsupervised learning; local principal componentanalysis; self-organizing mapping1 introductionprincipal component analysis (pca) has been wildly used in dimension re- duction of multi-variate data, data compression, pattern recognition and sta- tistical analysis. a pca algorithm is designed to approximate the original high-dimensional pattern space by a low-dimensional subspace spanned by the principal eigenvectors of the data covariance matrix. in this way, the data distribution can be represented and reconstructed by the principal eigenvec- tors and their corresponding eigenvalues. however, pca is globally linear and1 this work was supported by national science foundation of china under grant60471055 and specialized research fund for the doctoral program of higher edu- cation under grant 20040614017.preprint submitted to elsevier science 28 september 20071inefficient for data distribution with non-linear dependencies 1. several ex- tensions have been suggested to overcome this problem. these extensions fall into two main categories: the global nonlinear approach and locally linear de- scriptions. the former includes the principal curves 2 and kernel pca 3. we focus on the latter approaches in which the data distribution is represented by a collection of local pca units 78.when describing a data distribution with a mixture model, the question arises what kind of units should the mixture contain. in gaussian mixture models 7, the local iso-density surface of a uniform gaussian unit is a sphere. and a local pca unit corresponds to a multivariate gaussian with an ellipsoid iso-density surface. despite its greater complexity, the local pca is favorable over a sphere unit for the following reasons. an ellipsoid can describe a local structure for which many spheres are needed. furthermore, data distributions are usually constrained locally to subspaces with fewer dimensions than the space of the training data. thus, there are directions in which the distribution has locally zero variance (or almost zero because of noise). an ellipsoid unit representation can extend its minor components into the additional noise dimensions. but the computational cost increases over-proportionally with the number of principal components needed.in the recent work ngas-pca 4, the local pca model is based on soft competition scheme of neural gas algorithm and the rrlsa 10 online pca learning in each local unit. the novelty of the ngas-pca is that its dis- tance measure for neuron competition is the combination of a normalized mahalanobis distance and the squared reconstruction error. note that storing the principal basis vectors for every unit involves rather complicated updating procedures as in ngas-pca and assom 9. another recent model is called the pca-som 5 which proposes an alternative way by storing the local in- formation in the covariance matrix. this is directly derived from on statistics theories and has great advantages over the assom model both in computa- tion burden and reliability of the result. however, pca-som only uses the reconstruction error in the principal subspace in its neuron competition. for this reason, pca-som still need the pca decomposition or updating with predefined numbers of the principal components when each training data is presented.in this paper, we propose a new computational efficient local pca algorithm that combines the advantages of ngas-pca and pca-som. each unit is associated with its mean vector and covariance matrix. the new competition measure implicitly incorporate the reconstruction error and distance between the input data and the unit center. in the proposed algorithm, the extra up- dating step of the principal subspaces is eliminated from the data distribution learning process. in addition, the local distribution of the input data is com- pletely stored in the covariance matrix instead of the predefined numbers of8精品論文the principal components. one potential application of the proposed model is the nonlinear pattern learning and recalling 11. in most models of associative memory, memories are stored as attractive fixed points at discrete locations in state space. discrete attractors may not be appropriate for patterns with con- tinuous variability 13. for example, the images of a three-dimensional object from different viewpoints 12. after the training process, the data distribution is represented by a collection of local linear units. no priori information for the optimal principal subspace is needed for the pattern representation.the rest of this paper is organized as follows. in section 2, the proposed compe- tition measure and neuron updating method are formulated. section 3 presents the simulation results and discussions. section 4 briefly reviews the ngas- pca and pca-som algorithms. and their relations to the proposed method are also discussed in details in this section. finally, the paper is concluded in section 5.2 the local pca modelin this section, we present the detailed formulation of the local pca models. the following notations are used throughout the paper. the input data setis x = xi |xi rd , i = 1, , m , where d is the data dimension. for each local unit, there are the center of the unit cj and the covariance matrixcj where j = 1, , n . the principal subspace of cj is characterized by w (p) = w1, , wp and (p) = diag1 , , p , where wi is the eigenvector and i the corresponding eigenvalue (i = 1, , p). throughout the paper the eigenvalues are arranged in the descending order. thus p is the number of underlying principal components. the rest (d p) eigenvectors spans theunderlying minor subspace.2.1 the proposed competition measurewe seek to formulate a local pca algorithm that can simplify both the neuron competition and updating. as in pca-som, each unit in our model is also as- sociated with its center and the covariance matrix. thus the data distribution is represented by a mixture of the ellipsoid local units. this is closely related to the anisotropic k-means clustering and kernel density estimation where we seek to minimizing the mahalanobis distancem ahalanobis(x) = xt c 1 x,or maximizing the normalized gaussian density function:(x) = 1e x c x,1 t 121z +(x)dx = 1,(1)2|c | 2 where |c | = 0. however, due to the matrix inversion c 1 , the computation complexity of the mahalanobis distance is o(d!). we consider an alternativeneuron competition measure:n r(x, c ) =xt c xtr(c )(xt x)2,(2)dwhere the size of the hyper-ellipsoid is normalized by tr(c ) = x i . the newi=1measure is called the normalized rayleigh ratio n r(x, c ). it is the normalizedscattering intensity of the data points. the computation burden of n r(x, c )is reduced to o(d2 ). the definition of n r(x, c ) is based on the assumptionthat xt x = 0. the following analysis start from the the rayleigh quotient.given a hermitian matrix c and nonzero vector x, the rayleigh quotientr(x, c ) is defined by:r(x, c ) =xt c x xt x .rayleigh-ritz theorem: if the eigenvalues of a hermitian (symmetric) ma- trix c is ordered as max = 1 2 d1 d = min , thenmin xt c xxt x max ,(3)where the equalities hold when x is an eigenvector of c . the gradient andhessian matrix of r(x, c ) can be computed as:r 2(x) =(c r i )x, (4)x xt x2r 2r t r thr (x) = xxt = xt x c x xwhere i is the identity matrix. x( x )x r i , (5)for the jth local pca neuron unit with its center cj and covariance matrixcj , the normalized rayleigh ratios n r(, cj ) is then written ast c n r(, cj ) = tr(c )(t )2 ,(6)jwhere = x cj . if n r(, cj ) is used as the competition measure in the probabilistic som training process, the winner v = maxn r(x, cj (t), cj (t).the final result of the training process is obtained when n r(x, cj (t), cj ) (j =1, , n ) is maximized for x x|e(x) = cj , e(xxt ) = cj . to analyze themaximum property of n r, the neuron index j is omitted.it is easy to know from (4) and (5) that () = 0 only at the critical points, where = wi (i = 1, , d). and at these critical pointshr (wi )wj = 0j = i, (j i )wjj = i.in real applications the covariance matrix c are positive defined, i.e. rank(c ) =d. note that x = 0, there is no local minima or maxima except for those onthe edge of the domain, i.e. the maxima at = wd , and the minima at = w1.tt this shows that the rayleigh ratio c is only polarized by the directions ofthe input vectors.take the logistic of n r(, c ) and compute the gradient,(ln(n r(, c )= (ln(t c ) 2ln(t )2c = t c 2=4xt c x t 2x t c .(7)t c t without losing generality, we assume = x c is in the subspace spanned by the eigenvectors of the covariance matrix c .1/2 = 1 w1 + + d wd ,kkdiwhere x 2 = 1. the second term in (7) is written asi=1c t 2 t c d dd d != kk3/2 x i wi wt x i wi 2 x i wi x i 2iiiiii d d j= kk3/2 x i i 2(x j 2)i wi .ijsince wi , i = 1, , d is a basis vector set in rd , ln(n r)/ = 0 only ifdji i 2(x j 2)i = 0,jfor all i = 1, , d. however, this is not possible (note kk = 0). thusn r(, c ) has no local maxima except on the edge of the domain. from therayleigh-ritz theorem (3), we have1t c 1 1t t c 1 t max t min max minandmint c maxmint c maxtr(c ) tr(c )t tr(c ) . tr(c )t tr(c )(t )2 tr(c )t .in the probabilistic framework as som training, the iterative algorithm uses input data (x x|e(x) = c, e(xxt ) = c ) over multiple pass. we considerthe property of the training results in the sense of mathematical expectation,e min(t )2 # #= e min t c 1 t= e(tr(c ) pdi) = = i=1. (8)kk6=0t c kk6=0maxmaxmaxit can be observed from (8) that maximizing the normalized rayleigh ratio (2)through the som training process is equivalent to minimizing the mahalanobis distance xt c 1 x, with a constrain on the size of the hyper-ellipsoid tr(c ) =dx i .i=12.2 updating of the neuronsthe neuron centers and covariance matrices are updated similar to that of pca-som. in practical situation, the normalized rayleigh ratio (6) is regu- lated as:tnr(, c ) =c + ,tr(c )(t )2 + jin case kk = 0 occurs, where = x cj (t) and 0 is a small constant. when a input data point,i.e. x, is randomly selected from the data set x , the winning neuron v is chosen as v = maxn r(x, cj (t), cj (t) (j = 1, , n ).then all the units (j = 1, , n ) are updated as follows:cj (t + 1) = cj (t) + (t)hv,j (t) x cj (t) ,(9)c j (t + 1) = (x cj (t + 1)(x cj (t + 1)t,(10)jcj (t + 1) = cj (t) + (t)hv,j (t) hc (t + 1) cj (t)i ,(11)where (t) is the learning rate, and hv,j (t) is the neighborhood function of jth neuron given the winning neuron v. the training process stops when the changes of cj and cj can be neglected. in the proposed method, the amplitudeof neuron updates (9)-(11) are not dependent on the iteration number, but on how well the neural network fits the input data 6. to achieve this, thelearning rate (t) is calculated as follows:(t) = kx(t) cv (t)k2 , r(t)wherer(t) = maxkx(t) cv (t)k2, r(t 1),r(0) = kx(0) cv (0)k2.the neighborhood function depends on n r(cj (t), cv (t), cv (t) and is calcu- lated ashv,j (t) = exp n r(cj (t), cv (t), cv (t) !(t),(12)where is a constant that scales the size of the neighborhood. in our experi-nments in section 3, it can be estimated as 1of the total variance of the datadistribution.the proposed method is an iterative deterministic algorithm similar to the neural networks for global pca and som . the proposed method updates neurons and converges iteratively using the sequentially training data over multiple passes. new data can be incorporated to update the outputs by added to the previous training set. the iterations continue until converge. here, convergence is reached when changes of neuron centers and covariance matrix can be neglected.3 simulations and discussionwe test the proposed local pca method on synthetic data of two and three dimensions and on high-dimensional data in pattern completion learning and recalling tasks.3.1 synthetic datathe following setting was used in the training and demonstration: the initial neuron centers are randomly located at the data points. the initial covariance matrix were identity matrices multiplied with a scaler related to the spatial extension of the distribution and are also specified for each experiment. similarto in (12), we set the scaler as maxi=1, , di /n , where i is the eigenvalue of themm 1covariance matrix 1 x(xj x)(xj x)t . in the following figures, trainingj=1vectors are depicted as small gray dots. for the ease of illustration, the neuronsare visualized as a set of ellipsoids centered at “”s. the orientations and half- axis of the ellipsoids are obtained from the estimated covariance cj at training step t. for graphical reasons, the length of the half-axis of a unit j is shown as 1.5qj .to evaluate the performances, we compute the log-likelihood per data pointl 4 based on gaussian density estimation. the gaussian probability densityfunction pj (x) = (2)(d2) |cj |1/2 exp 1 (x cj )t c 1(x cj ) for each unit2 jj. the prior probability or mixing parameter 14 of each unit is estimated frompj = mj /m , where mj is the number of training patterns assigned to unit jand m is the total number of training data. the probability density function精品論文n9of a gaussian mixture model with n units is p(x) =mx pj pj (x). the log-j=1mlikelihood per pattern l is computed as l = 1x log p(xi ). for a given datai=1set, l decrease from the unordered to the final well-ordered approximation ofthe data distribution by the local pca algorithms.in the two-dimensional example, the data distribution is approximated byngas-pca, pca-som and the proposed method. fig. 1 shows the the train- ing process of the proposed method (n = 7) at different time steps t. in fig.2, the train process of ngas-pca (p = d = 2) and pca-som (p = 1) areshown, where the initial neuron states are the same to that of fig. 1. it can been seen that ngas-pca can also approximate the training data by the local linear units. but the “dead units” may occur. in pca-som, minimizing the reconstruction error alone results in data assignment only related to the principal directions and is not suitable for the present task. as is shown in fig. 2 (d)-(f ), the structure of the data distribution can not be extracted.(a) t=0, l = 4.7465(b) t=1000, l = 3.6898 (c) t=4000, l = 3.1522(d) t=8000, l = 2.8863(e) t=12000, l = 2.5091(f ) t=20000, l = 2.4001fig. 1. snapshots of the training process of the proposed method at different time steps t for a two dimensional distribution with 700 points (n = 7).to further evaluate the performances, ngas-pca (p = d = 2) and the pro- posed model are trained on the two dimensional distribution (fig.1) startingfrom 100 random initial states. (number of neurons n = 7). fig. 3 showsthe comparison of the training results by the log-likelihood of density estima-精品論文(a)t=2000, l = 3.4678 (b) t=10000, l =

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論