基于規(guī)則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究_第1頁
基于規(guī)則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究_第2頁
基于規(guī)則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究_第3頁
基于規(guī)則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究_第4頁
基于規(guī)則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究_第5頁
已閱讀5頁,還剩40頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、分類號:碩士學(xué)位論文基于規(guī)則化軌道算法和有效分布式算法 題 S:的高維稀疏精度矩陣估計的研究Research on Regularization Paths Algorithm andEfficient Distribution Algorithm for High-Dimensionalj 11 e: Sparse Precision Matrices Estimation學(xué)科、專業(yè): 統(tǒng)計學(xué)研究方向:高維統(tǒng)計推斷作者姓名:王冠鵬導(dǎo)師及職稱:黃旭東副教授論文提交日期:2018年6月授予學(xué)位日期:安徽師范大學(xué)學(xué)位評定委員會辦公室學(xué)位論文獨創(chuàng)性聲明本人聲明所呈交的學(xué)位論文是本人在導(dǎo)師指導(dǎo)下進(jìn)行的

2、研究工作及取得的研 究成果,與我一同工作的同志對本研究所做的任何貢獻(xiàn)均已在論文中作了明確的 說明并表示謝意,除了文中特別加以標(biāo)注和致謝的地方外,論文中不包含其他人 己經(jīng)發(fā)表或撰寫過的研究成果。學(xué)位論文作者簽名: 簽字日期:次匚年6月日學(xué)位論文版權(quán)使用授權(quán)書本學(xué)位論文作者完全了解安徽師范大學(xué)有關(guān)保留、使用學(xué)位論文的規(guī)定: 學(xué)校有權(quán)保留并向國家有關(guān)部門或機構(gòu)送交論文的復(fù)印件和電子版,允許論文 被查閱和借閱。本人授權(quán) 安徽師范大學(xué) 可以將學(xué)位論文的全部或部分內(nèi)容編 入有關(guān)數(shù)據(jù)庫進(jìn)行檢索,可以采用影印、縮印或掃描等復(fù)制手段保存、匯編學(xué) 位論文。保密的學(xué)位論文在解密后適用本授權(quán)書。學(xué)位論文作者簽名:彼鄉(xiāng)

3、此簽字日期:訃Y年6月心日導(dǎo)師簽名:簽字日期W年6月J日電話:郵編:學(xué)位論文作者獲學(xué)位后去向: 工作單位: 通訊地址:Research on Regularization Paths Algorithm and Efficient DistributionAlgorithm fbr High-Dimensional Sparse Precision Matrices Estimation王冠鵬安徽師范大學(xué)碩士學(xué)位論文二O八年六月本論文經(jīng)答辯委員會全體委員審查,確認(rèn)符合安徽師范大學(xué)碩士學(xué)位論文質(zhì)量要求。答辯委員會簽名:主席:(工作單位、職稱)崔恒建 首都師范大學(xué)、教授委員:祝東進(jìn) 安徽師范大學(xué)、

4、教授 任永 安徽師范大學(xué)、教授 申廣君 安徽師范大學(xué)、教授徐林安徽師范大學(xué)、教授導(dǎo)師:Research on Regularization Paths Algorithm and EfficientDistribution Algorithm for High-Dimensional SparsePrecision Matrices EstimationAbstractThis paper studies the estimation of high-dimensional sparse precision matrices under the scenario of large dimens

5、ion p and small sample size n.In the first part, we focus on computing regularization paths, or solving the optimization problem over the full range of regularization parameters. Benefit from a precision matrix estimator which is defined as the minimize! of the lasso under a positive- definiteness c

6、onstraint, we aim to compute the reguiarization paths of estimating the positive-definite precision matrices through a one-step approximation of the Alternating Direction Method of Multipliers (ADMM), which quickly outlines a sequence of sparse solutions at a fine resolution. We demonstrate the effe

7、ctiveness and computational savings of our proposed algorithm through elaborative analysis of simulated examples.In the second part, distributed estimation of high-dimensional sparse precision matrix is proposed based on the debiased D-trace loss penalized lasso method and the hard threshold method

8、when samples are distributed into different machines for transelliptical graphical models. At a certain level of sparsity, this method not only achieves the correct selection, of non-zero elements of sparse precision matrix, but also the error rates can be comparable with estimation in a non-distrib

9、uted setting. The numerical results further demonstrate that the method is effective comparing with other methods.Keywords : High-dimensional precision matrices; Sparsity; Alternating direction method of multiplier; Regularization parameters; Efficient distributed estimation; Threshold parameter.基于規(guī)

10、則化軌道算法和有效分布式算法的高維稀疏精度矩陣估計的研究摘要本文主要對大維p小樣本n情形下高維稀疏精度矩陣的估計進(jìn)行研究。在第一部分中,我們著重于計算正則化軌道,或者在整個正則化參數(shù)范圍內(nèi)解 決相應(yīng)的最優(yōu)化問題。受益于正定約束條件下基于Lasso方法的最優(yōu)化精度矩陣 估計量,我們旨在通過ADMM方法一步逼近計算正定精度矩陣的正則化軌道,該 方法能以高分辨率快速勾勒出一系列稀疏解。通過對模擬實例進(jìn)行詳細(xì)分析,進(jìn) 一步證明了我們提出的算法的有效性和計算的優(yōu)越性。第二部分,,在反式橢圓圖模型中,我們基于去偏的D-trace損失Lasso懲罰方 法和閾值方法給出了數(shù)據(jù)分布在不同機器上高維稀疏精度矩陣的

11、分布式估計。在 一定的稀疏水平下,該方法不僅可以實現(xiàn)稀疏精度矩陣的非零元素正確選擇,而 且誤差速率同非分布式估計相當(dāng)。數(shù)值結(jié)果進(jìn)一步表明該方法同其他方法比較具 有有效性。.關(guān)鍵詞:高維精度矩陣;稀疏性;乘子交替方向法;規(guī)則化參數(shù);高效的分布式估 計;閾值參數(shù).Contents TOC o 1-5 h z Abstracti中文摘要ii HYPERLINK l bookmark22 o Current Document Chaper 1Introduction1Background 1Notation 4Chaper 2 ADMM Algorithmic Regularization Paths

12、 for High-dimensionalSparse Precision Matrix6Model setting 6ADMM Algorithmic with warm starts for sparse precision matrix8 HYPERLINK l bookmark25 o Current Document Algorithmic Regularization Path, for sparse precision matrix 10Numerical Studies 11Chaper 3 Efficient Distributed Estimation of High-di

13、mensional SparsePrecision Matrix for Transelliptical Graphical Models17Transelliptical Graphical Models 17The Kendalls r Statistic 18Rank-based Regularization Method 18Distributed Algorithms 19Main Results 20Numerical Studies 32 HYPERLINK l bookmark50 o Current Document Reference35Accepted paper dur

14、ing the master37Acknowledgements38Chaper 1 Introduction1.1 BackgroundMassive amounts of high-dimensional data is becoming increasingly important in theoretical and applied research of biology, medical imaging, genomics, econometrics, climate studies and machine learning other fields. In high-dimensi

15、onal data, the number of variable p can be much large the sample size n, and it is of both practical and theoretical importance to estimate sparse precision matrix. Suppose that we have n independen- t and identically distributed p-dimensional random variables, let S be the population covariance mat

16、rix and let 0 = (E)-1 denote the corresponding precision matrix, also called concentration matrix. As we all know, it is no longer feasible to invert the sample covariance to estimate the precision matrix in high dimensional setting, recently, many researchers have discussed the estimation of sparse

17、 precision matrix in setting where the number of parameters is large than the sample size, see 29,3. Afterwards, the seminal works has been estimating sparse precision matrix 0 for graphical models to describe the conditional dependence structure of the observed variables in the past two decades(see

18、 15,26). Particularly, if adimensional random vector X N,pji, then。重=0 if and only if X* and Xj are conditional independence on the marginal random variable. Due to its importance, recently, many literatures have researched the estimation of sparse precision matrix in high-dimensional regime when th

19、e number of variables p is much large than the sample size n in the domain of statistical and machine learning, such as (18,22,29,3, 4,13). As we know it is of both practical and theoretical properties importance to estimating a sparse precision matrix 0 in Gaussian graphical models. However, in ord

20、er to break the normal assumption of this limitation, on the basic of the elliptical distribution, Liu et al. (16) proposed a new distribution family named transelliptical graphical model. At the same time, it shows the practical significance that the precision matrix 0 is viable to describe the con

21、ditional dependence the estimation of the precision matrix for the transelliptical distribution.To invert the Kanish-Kuhn-TXicker (KKT) conditions of the lasso, Van de Geer et al. (24) constructed confidence intervals and inference methods for single or lowdimensional components of a large parameter

22、 vector in a high-dimensional models based on desparsifying the lasso. Following this idea of it, furthermore, Jankova and Wn deGeer(14) proposed a de-sparsified estimator of sparse precision matrix based on the graphical lasso and its theoretical properties for low-dimensional statistical inference

23、 in high-dimensional Gaussian graphical models. It is because of Graphical lasso 四timator corresponds to -regularized maximum likelihood when the data are generated from a Gaussian graphical model in (23), Zhang and Zou(30) proposed a new loss function to constrain empirical loss minimization framew

24、ork for estimating high-dimensional sparse precfeion matrix, called D-trace lasso function. Due to the D-trace loss function has many merits: it is not only much simpler than, the graphical lasso loss, but also allowing more direct theoretical analysis and offering obvious computational advantages.

25、Therefore, Huang and Li(13) proposed a more effective method than Jankova and Van de Geer(14) for constructing confidence intervals and statistical inference based debiased estimator via lasso penalized D-trace lasso.However, the method mentioned above is to estimate on a single machine. Information

26、 technology has forever changed the data collection process, due to size limitation or collected and stored independently, so massive amounts of high-dimensional or irregular data are distributed over multiple machines. The difficult issue that collecting all the big datasets onto one machine to gua

27、rantee effective communication between all the implemented machines. Recently, some researches begin to focus on efficient distributed estimation and inference for precision matrix through various distributed algorithms. For instance, Arroyo and Hou(l) proposed a distributed estimation of inverse co

28、variance matrix for Gaussian graphical model, which is a special case of the transelliptical graphical in (16). Next, Xu et al. (28) considered a border distributed estimator of the precision matrix for transelliptical graphical models. The most important issue is that they achieved an effective dis

29、tributed estimation that can be comparable to a non-distributed setting( see 28).As far as we know, statisticians have become very enthusiastic as the rise of big data and the subsequent explosion of statistical machine learning methods have been used to analyze it. A large scale optimization progra

30、m for consumers to estimate the sparse model. Consider the following positive-definite constraint optimization problem: = argmintr(e2) - tr() + 入拙Z|i,。們 O = Z and Z h 迎.(1.1)。海)2where = 譯項 for some arbitrarily small e 0, Z &T denote Z is1. Backgroundpositive-definite matrix. However, solving the opt

31、imization problem (1.1) directly is often challenging. Recently, an operator splitting method, Alternating Direction Method of Multiplies (ADMM), has become more and more prevalent because it convert solving the problem into solving a sequence of simpler optimization problem which involve only the s

32、mooth loss and non-smooth penalty. ADMM has two merits: (1) it can straightforward to handle individually for decoupling constraints and regularizes, but not in conjunction ;(2) make mixture type penalties easily. Previously, we could break the objective into a sum of simple objectives, however, the

33、 issue is different in here, specifically the composition of the regularizer with a linear mapping complicates matters. Zhang and Zou (30) proposed sparse precision matrix estimation via lasso penalized Dtrace loss, considered the following positive constraint optimization problem:& = arg minS) - tr

34、(0) +(1.2)then we can use a much simpler alternating direction methods for computing 0. If 0 el, we can get 0=0. we consider the augmented LagrangiaiiL(0,0o,A) = 五)_tr(0)+ An|e|1 + (A,0- o) + (p/2) | 0 - o R We update A) according to the following three step:= arg min (0, Oq, Afc),0-0T= arg min Z(0f

35、c+1,0o, Afc),OoAfc+1 = A* + p(0fc+1 -The ADMM been used to decouple fusion or non-separable types of penalties in many statistical learning problem such as total variation overlapping group lasso penalties (see 25,9), joint graphical model selection (17,20), a convex formulation of clustering (5). T

36、his method may end up taking more iterations than directly solving (1.1) over breaking up the problem into smaller ones, but it often runs in less total time since the subproblems are typically easy to solve.In chapter 2, we suggest to compute the regularization paths of G(A) through a slight modifi

37、cation of Zhang and Zou (30)s ADMM algorithm. To chose an optimal A which yields the best estimate (A) in a certain sense, Zhang and Zou (30) suggested to inspect a sequence of sparse solutions over the full range of regularization parameters:Chapter Introduction(A) : 0 Ai - Cbn1.2 Notationfor some

38、constant C 0 independent of n and all n (7; and we write 晚=o(bn) if lim (in/bn = 0. Also we write anbnian Cbni otherwise an bn.nT8We use A項* and A*a; to denote the j-th row and 如th column of matrix A. The following is matrix norm, we write |A|i = max |Au|g, |A|oq,8 = max|A汀| and |A|p = |u|9=lG2諾 盤 |

39、2)i/2 to denote the 知 8,8 and Frobenius norm of a matrix A respectively. We use S%+ A Rpxp|A = AT, A A 0 denotes the set of positive definite pxp matrices. Assume the 0 is precision matrix, let S = S(3) : (i, j) G V x V|0ij 豐 0 be the set of all non-zero entries of 0 (where V is the set of (1,2, - ,

40、p). The cardinality of 5 is denoted by s. Sc = 5C() is defined as the complement of 5() in V x V. For a sequence of random variables Xn, we write Xn A X and Xn 烏 X to represent that Xn converges in probability and distribution to X. Specially, we use 魚(,)denote the cumulative distribution function o

41、f standard normal distribution,牛_() denote the inverse function of 中(),and we also use x: denote a chi-square distribution of degrees of freedom p.Chaper 2 ADMM Algorithmic Regularization Paths forHigh-dimensional Sparse Precision Matrix2.1 Model settingIn this section, we use Zhang and Zou (30)s AD

42、MM splitting method as the foundation upon which to develop a new approximation to the sequence of sparse solutions 0(A), which allows us to quickly explore the space of sparse solutions at a fine resolution, though with an ignorable sacrifice of estimation precision as long as the step size, define

43、d as max(人,+i Aj), is sufficiently small. The objective function formulated in (1.2) by3-Zhang and Zou (30) consists of a differentiable loss function, (|(02, S)-tr(0), a non- differentiable penalty, An|Z|iOff, and a constraint ( si) which ensures that 0(A) is positive-definite. In the constraint, s

44、 is a user-specified parametric which is sufficiently small, say, e = 10-6. The loss function is also convex in . The regularization parameter A 0 controls the trade-off between the loss function and the penalty. Following Zhang and Zou (30), we split the smooth loss function from the nonsmooth pena

45、lty through a copy parameter matrix Z, and add an equality constraint forcing 0 = Z, which converts (1.2)奮= argmin 只。七力-tr(6)+入Z|i,函, = Z and Z 我).(2.1)Ses(p)12JSimilar operator splitting strategies are also used in. many other learning problems such as total variation(25), jointly graphical model s

46、election(19,20), a convex formulation of clustering (5), etc. With scaled dual variable U of the same dimension as 0 and Z and an algorithm tuning parameter p, the associated augmented Lagrangian of (2.1) isC(。, Z, U) = -(02, S) tr(0) + An|Z|ijOff + |0 一 Z + U|p,(2.2)Let(0fe, Zk, Ufc) be the solutio

47、n at step k, for =0,1,2,We update (0, Z, U) according toO-subproblem:(2.3)0fc+1 := argmin(0, Ufc), subject to 0 el.0eS(p)Z-subproblem:Zfe+1 := argminZ:(efc,Z,Ufc).(2.4)ZDual update:U*+1 := U” + 0fc+1 - zfc+1.We solve these subproblems, together with the dual update, iteratively until convergence. Th

48、e benefit of solving these subproblems is that there are closed-form solutions of both (2.3) and (2.4), The O-subproblem can rewrite as the follows: TOC o 1-5 h z 8*+】:=arg min ( : (62,五 十 - (0,1 + p(Zk - Ufc) .(2.5)0eS(P) 12JLet G(A, B) denote the solution to the optimization problemargmin i(2, A)

49、(, B), A 0.(2.6)0es(?) 2Then we can write0fc+1 = G(S + pl, I + p(Zfc 一 U*).(2.7)The explicit solution to (2.6) is given in the following lemma.Proposition 1 Given any p-dimensional symmetric positive-definite matrix A and any p-dimensional matrix B, let A =be the eigenvalue decomposition of A, witho

50、rdered eigenvalues . ap. DefineG(A,B) = argmini(25A) - (B,),ees(p)2ThenG(A, B) = VA(VlBVX)oCVwhere o denotes the Hadamard product of matrices and Cij = 2/(刀 + 弓).1b update Zk+1, from (2.4) we writeZA+i = argminAnllZHff +- pZ,導(dǎo)+】+ULet S(A, A) denote the solution to the optimization problemarg min llz

51、lli,off +一 (Z, A.Then we can write .-Zfc+1 = S(0fc+1 + U 氣令),(2.8)where the operator S is defined byi-j,S(A)X)ij =i A,i 手 j, Aj A,i / Aij A.In other word,the function of S(A, X) is equivalent to this equation:S(A, A)ij = 3) +sign(Aij)max|A待| - A, 0I(i / j).For a symmetric matrix M we define the matr

52、ix operate M+ as follows: Let the eigenvalue decomposition of M be Vk/diag(*i,; thenM+ = Vivfdiag(inax(Ai,), .,max(Ap,)V.We now have all the pieces needed to work out the alternating direction method for solving (2.1).2.2 ADMM Algorithmic with warm starts for sparse precisionmatrixFirstly, we introd

53、uction a new extension of the ADMM algorithm for approximating regularization paths and then urge to our motivation of studying the sequence of active sets depicted by Algorithm 1 (Algorithm with warm starts) is an alternative algorithm for fitting regularization paths, Algorithm begins with A small

54、 corresponding to a dense model, in order to obtain the solution to fit the ADMM algorithm and then use the previous solution, 0(Ay_i), and dual variable U(Aj_i), to initialize the ADMM algorithm for Xj.Considering the sequence of active sets depicted by this algorithm, we shoidd not to discuss some

55、 noteworthy features. There have ADMM tuning parameter (p), does not appear in this algorithmic. We have deleted this as a parameter by giving p = 1 through the algorithm. Fixing p = 1 means in contrast with the burgeoning literature on how to dynamically update p for ADMM algorithms (2) adaptive pr

56、ocedures that change P to speed up convergence in 10. however, changing the algorithm tuning parameter,2.3 ADMM Algorithmic with warm starts for sparse precision matrixis not beneficial to achieving a path-like algorithm using warm-starts. Consider the Z- subproblem which is solving by soft-threshol

57、ding at the level 等.Thus, if p is changed in the algorithm , the sparsity levels of Z dramatically change, eliminating the advantages of using warm-starts to achieve smooth transition in sparsity levels. The form of ADMM with warm starts as follow:Algorithm 1: ADMM with warm starts to estimate high-

58、dimensional sparse precision matrixl.Initialize:= el, U = O,and M log-spaced values, Aj A2 tolerance doZ*(為)=S2(為)+ XJ2(為),為+; 臥(為)=G(E + pl,l+ (Z&(為)-(為);Ufe(Ay) = U內(nèi))+ 導(dǎo)(為)-Zfc(A7);rk =- Z*(為)and sfc = Z打為)Z*t (膈);k k+1;end whileend for3.output( 0(A), j = 1, .M as the regularization path.From the

59、algorithm 1, the convergence is measured by the primal and dual residuals (2). The resultant solution goes from dense to sparse, or equivalently, the regularization parameter A goes from small to large. The ADMM algorithm with warm starts can certainly go in the reverse direction from sparse to dens

60、e. However, going from sparse to dense may introduce additional yet unnecessary discontinuities in the Z-subproblem, consequently would require more iterations for convergence. Therefore, we advocate using the ADMM algorithm with warm starts going from dense to sparse, as described in Algorithm 1 at

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論