




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、Optimal Control of a Wind Generator System Doubly-Fed Induction Using Non-Squares Estimators基于非平方估計(jì)風(fēng)力發(fā)電系統(tǒng)的最優(yōu)控制的雙饋感應(yīng)電機(jī)Federal University of Maranhao, State University of Maranh aoEmbedded Systems and Intelligent Control LaboratoryBiological Information Processing LaboratorySao Luis-MA, BrazilEmail: p
2、rof.queiroz.jonathanAbstractBecause the randomness nature associated to primary energy sources, the control of eolic and solar energy systems demands methods and technics that are suitable with the high degree of the environment uncertainties. The reinforcement learning (RL) and approximate dynamic
3、programming (ADP) approaches furnish the key ideas and the mathematical formulations to develop optimal control system methods and strategies for alternative energy systems. We are proposing a online design method to establish control strategies for the the main unit of a eolic system that is the do
4、ubly fed induction generator (DFIG).The proposed methodology is based on development of online algorithms for approximate solutions of the Hamilton- JacobiBellman (HJB) equation through a familiy of non-squares approximators for critic adaptive solution of the Discrete Algebraic Riccati Equation (DA
5、RE), associated with the problem of Discrete Linear Quadratic Regulator (DLQR) .摘要-由于初級能源具有的隨機(jī)性本質(zhì),風(fēng)力和太陽能所構(gòu)成的控制系統(tǒng)需求的理論和技術(shù)要適用于極端不確定的環(huán)境。強(qiáng)化學(xué)習(xí)(RL)和近似動(dòng)態(tài)規(guī)劃(ADP)方法提供關(guān)鍵思想和數(shù)學(xué)公式制定替代能源系統(tǒng)優(yōu)化控制系統(tǒng)的方法和策略。我們提出一個(gè)在線設(shè)計(jì)方法建立控制策略的風(fēng)成系統(tǒng),該系統(tǒng)是雙饋感應(yīng)發(fā)電機(jī)(DFIG).該方法是基于對Hamilton- JacobiBellman(HJB)在線的近似解,通過離散代數(shù)Riccati方程(DARE)的精確解中的非平
6、方逼近的系列的方程自適應(yīng)解決在離散的線性二次型調(diào)節(jié)器問題相關(guān)(DLQR)。Index TermsHeuristic Dynamic Programming, Dynamic Programming, Convergence, Discrete Linear Quadratic Regulator, Digital Control, DFIG wind turbines; Doubly Fed Induction Generator.關(guān)鍵詞:啟發(fā)式動(dòng)態(tài)規(guī)劃,動(dòng)態(tài)規(guī)劃,收斂,離散線性二次調(diào)節(jié),數(shù)字控制,雙饋風(fēng)力發(fā)電機(jī)組;雙饋感應(yīng)發(fā)電機(jī)。I. INTRODUCTIONI. 簡介Electricity
7、 is fundamental to the development of countries and the quality of life 1. As more countries develop, it becomes necessary to further increase the energy production 2.At the same time, you also need to preserve the environment by using natural resources with conscience. Therefore , in addition to ex
8、panding the capacity of power generation by improving the use of conventional sources, it is also necessary to develop technologies for the use of new energy sources-socalled alternative energy sources 3.電力是國家的發(fā)展和生活質(zhì)量的基礎(chǔ)1。隨著越來越多的國家的發(fā)展,有必要進(jìn)一步加大對能源生產(chǎn)2。同時(shí),還需要通過使用自然資源與良知,以保護(hù)環(huán)境。因此,除了通過提高使用的常規(guī)來源擴(kuò)大發(fā)電的能力,也有
9、必要開發(fā)新技術(shù)的使用新能源 - 所謂的替代能源3。Lots of efforts are being made today for the development of alternative energy systems, such as solar and eolic plants as primary energy sources to transform those energies into electrical energy 4. These natural resources are subjected to uncertainties provoked by environme
10、ntal changes of temperature, pressure and human been environment changes. Due to these uncertainties control systems must be robust to lead with random situations in his normal operation. Mostly to minimize and sometimes to avoid unwanted effects of uncertainties, we present the first insights of on
11、-line optimal control design method based on reinforcement learning and approximated solution of HJB equation that is oriented to handle random and non-linear processes 5.目前,大家不遺余力制造用于替代能源系統(tǒng),例如太陽能與風(fēng)能作為初級能源轉(zhuǎn)化成電能4。這些天然資源不確定性受環(huán)境影響,比如溫度,壓力和人類環(huán)境變化。由于這些不確定性的控制系統(tǒng)必須是主要的,才能導(dǎo)致系統(tǒng)正常運(yùn)行的隨機(jī)情況。主要是為了最大限度地減少,有時(shí)為了避免不確
12、定因素的不良影響,我們目前在線優(yōu)化控制設(shè)計(jì)方法的第一個(gè)見解的基礎(chǔ)上強(qiáng)化學(xué)習(xí),而且是面向處理隨機(jī)和非線性過程HJB方程的近似解5。Thorough studies in ADP have been conducted by 6 7, where they propose news ideas and comment the trends of ADP for this decade. The state of the art on approximate solution of the Hamilton-Jacobi-Bellman (HJB) equation that is associat
13、ed to the discrete algebraic Riccati equation 8 can be found in 9 10. In references 11 12 the efficient reinforcement learning, temporal difference learning and function approximation, respectively, are discussed in the context of least-square to solve HJB equation.在ADP深入研究已進(jìn)行6 7,他們提出的想法的新聞和評論ADP的趨勢
14、這十年。藝術(shù)上關(guān)聯(lián)到離散代數(shù)Riccati方程8漢密爾頓 - 雅可比 - 貝爾曼(HJB)方程的近似解的狀態(tài)中可以找到9 10。在參考文獻(xiàn)1112高效的強(qiáng)化學(xué)習(xí),時(shí)間差的學(xué)習(xí)和函數(shù)逼近,分別是在上下文的最小二乘求解HJB方程的討論。Reports on applications of DLQR control in DFIG show the importance of this device and technologic improvements that are promoted by optimal control strategies. The reference 13 in dec
15、entralized nonlinear control of wind turbine via DFIG (Doubly Fed Induction Generator) presents the linear quadratic regulator (LQR) design method to improve the transient stability of the power systems and enhance the system damping. An optimal control strategy based on LQR for DFIG is presented by
16、 14. The strategies are designed to solve transient stability problems, and gain adjustment of linear quadratic controllers is performed via deviations of weighting matrices values. In reference 15, the authors present an optimal control strategy for reactive power, where the DFIG is a reactive powe
17、r source of the wind farm, where a genetic algorithm was developed to optimize the control strategy. An optimal tracking of the secondary voltage control for DFIG. This control is based on the regulation margin of the grid buses intelligent selection based on the voltage violation condition.在雙饋發(fā)電機(jī)DL
18、QR控制的應(yīng)用報(bào)告顯示了由最優(yōu)控制策略重要性。通過雙饋風(fēng)力發(fā)電機(jī)組的非線性分散控制的參考13(雙饋感應(yīng)發(fā)電機(jī))提出了線性二次調(diào)節(jié)器(LQR)的設(shè)計(jì)方法來提高電力系統(tǒng)的暫態(tài)穩(wěn)定性,提高了系統(tǒng)的阻尼?;贚QR的雙饋發(fā)電機(jī)的最優(yōu)控制策略由文獻(xiàn)14提出。該戰(zhàn)略的目的是解決暫態(tài)穩(wěn)定性問題,線性二次控制器的增益調(diào)節(jié)是通過加權(quán)矩陣值的偏差進(jìn)行。在參考文獻(xiàn)15中,作者提出對無功功率,其中,所述雙饋是風(fēng)電場,其中,遺傳算法被開發(fā)來優(yōu)化控制策略的一個(gè)無功功率源的最優(yōu)控制策略。對于雙饋次級電壓控制的最優(yōu)跟蹤。該控制是基于基于所述電壓違規(guī)狀況電網(wǎng)母線智能選擇的調(diào)節(jié)裕度。In this paper, we prese
19、nt a novel method and a on-line algorithm to design control strategies for DFIG in eolic plants. We present a adaptive recursive algorithm inspired on the RLS, which is based upon a family of non-squares error loss functions 16, 17, 18 that is, different from the RLS algorithm, it takes up not just
20、one power, but rather the sum of even error powers aiming at estimating the solution for the HJB-Ricatti equation via HDP. Such an algorithm will be herein referred to as Recursive Least Non-Squares (RLNS) Algorithms. We will show through computing experiments how the novel estimator leads to better
21、 performance outcomes in terms of convergence speed when compared with the standard RLS estimator for approximating HJB-Ricatti equation solution via HDP在本文中,我們提出了一個(gè)新的方法和一個(gè)在線算法風(fēng)成工廠設(shè)計(jì)控制策略雙饋。我們提出了一個(gè)自適應(yīng)遞歸算法啟發(fā)對RLS,這是基于一個(gè)家族的非平方誤差損失函數(shù)16,17,18即,從RLS算法不同,它占用不只是一個(gè)功率,而是連誤差功率旨在估算經(jīng)由HDP的HJB-黎卡提方程式解的總和。這種算法將在本文中稱
22、為遞歸最小非平方(同一固定體位)算法。我們將通過計(jì)算實(shí)驗(yàn)表明如何小說估計(jì)在收斂速度方面與標(biāo)準(zhǔn)的RLS估計(jì)通過HDP逼近HJB-黎卡提方程的解決方案相比,帶來更好的業(yè)績成果。II. CONTROL POLICY AND BELLMAN EQUATIONII. 系統(tǒng)的結(jié)構(gòu)The generalized formulation of the control policy and the Bellman equation which aims at showing the necessary elements for the development of a HDP, such as the mark
23、ovian decision process and dynamic programming that enables the problem characterization is presented here.控制策略和Bellman方程,其目的是示出如馬爾可夫決策過程和動(dòng)態(tài)規(guī)劃,使該問題表征這里提出供HDP,發(fā)展的必要因素的廣義的制劑。Let = (X, U, f, c) be a markovian decision process (MDP), where X is the state space, U is the control action space, f : X ×
24、;U X ; f(xk, uk) = xk+1 is the deterministic state transition function and c : X×U < c(xk, uk) = ck is the utility function that establishes the cost ck of the transition from xk to xk+1, this transition is guided (forced) by control action uk. For each state xk X there is a subset U(xk) U o
25、f admissible actions, where uk is an element of this subset.讓 = (X, U, f, c)是一類馬爾可夫決策過程(MDP),其中X是狀態(tài)空間,U是控制動(dòng)作的空間中,f : X ×U X ; f(xk, uk) = xk+1是確定性的狀態(tài)轉(zhuǎn)移函數(shù)和c : X×U < c(xk, uk) = ck是實(shí)用功能函數(shù),建立轉(zhuǎn)型的代價(jià)從xk 到 xk+1,,這個(gè)過渡引導(dǎo)(強(qiáng)制)ck變化。對于每一個(gè)xk X中的一個(gè)子集U(xk) U是可允許運(yùn)動(dòng)的,其中uk是這個(gè)子集的元素。The control policy is gi
26、ven by the mapping h : X U that produces an action uk to be taken in time k. For a given control policy h(xk) = uk, the value function V h : is given by控制策略由映射給出h : X U,在k時(shí)刻采取產(chǎn)生的動(dòng)作uk。對于給定的控制策略h(xk) = uk,值函數(shù)V h : 由下式給出Where is the discount factor that is defined in 0 1.For each xk X, V h must agree w
27、ith the equation that is given by其中是定義在0 1的因子,對于每個(gè)xk X,V h必須由以下給出的公式達(dá)成一致that is called Bellman equation.被稱為Bellman方程。The MDP purpose is to establish a control or decision policy hthat is optimal in the sense that it promotes the smallest possible set of costs, which satisfies the following inequalit
28、y.所述MDP目的是建立一個(gè)控制或決定策略H *是在于它促進(jìn)最小可能集合的成本,滿足下述不等式的最佳for each xk X and for all policies hj. According to BellmanOptimality Principle, 9, the optimal value V is given by 每個(gè)xk X和所有策略hj。根據(jù)貝爾曼最優(yōu)原理9,最優(yōu)值V *由下式給出and the optimal control policy h is given by和最優(yōu)控制策略H *由下式給出III. BELLMAN EQUATIONS FOR DLQR CONTROL
29、 SYSTEMIII. 貝爾曼方程D LQR控制系統(tǒng)The discrete linear quadratic regulator is characterized in the context of markovian decision process for HDP design purpose. The parameterizations of the value function V, the utility function c, and the state and control policy mappings f and h presented in Section II are
30、 formulated to determine online the gains of the DLQR control system. The Bellman equations are presented in the Lyapunov and Riccati forms that are linear and nonlinear for unknown parameters, respectively.離散線性二次調(diào)節(jié)的特點(diǎn)是在設(shè)計(jì)HDP目的馬爾可夫決策過程的情況下。價(jià)值函數(shù)V,效用函數(shù)C,并且狀態(tài)和控制策略映射F和H在第二節(jié)提出的參數(shù)化,制定網(wǎng)上確定DLQR控制系統(tǒng)的增益。貝爾曼方程
31、在那些線性和非線性未知參數(shù)李雅普諾夫和黎卡提形式分別給出。A. DLQR ParameterizationA. DLQR參數(shù)The models f of the dynamic system and h of the control policy are linear mappings that are represented for combiners of the states and inputs. The state f(xk, uk) and decision policy h(xk) parameterizations are given by.該模型f的控制策略的動(dòng)態(tài)系統(tǒng)和h是
32、線性映射,代表的狀態(tài)和輸入器。狀態(tài)f(xk, uk)和h(xk)參數(shù)化由下式給出。where A <n×n, n is the system order, B <n×n, ne is the number of the system inputs and K <ne×n i is the gain matrix of the state feedback. It is assumed that B <n×nis stabilizable, that is, there is a matrix K that guarantees t
33、hat the closed loop system其中A <n×n,n是系統(tǒng)整理,B <n×n,ne是系統(tǒng)輸入B <n×n是狀態(tài)反饋增益矩陣。假設(shè)(A, B)是穩(wěn)定的,也就是說,有一個(gè)矩陣K,保證該系統(tǒng)閉環(huán):is asymptotically stable.是漸近穩(wěn)定。The utility function c associated with the system (6)-(7) has a quadratic form that is given by與系統(tǒng)相關(guān)聯(lián)的效用函數(shù)c由(6)-(7)二次形式由下式給出where the weight
34、ing matrices Q <n×n 0 and R <ne×ne > 0 are symmetric.其中加權(quán)矩陣Q <n×n 0和R <ne×ne > 0是對稱的。Replacing the parametrization of the utility function, Eq.(9), and decision (control) policy, Eq.(7), into Eq.(1), one obtains the parameterized DLQR cost function. The DLQR con
35、trol main purpose is to select a control policy K that minimizes a cost function that is given by更換效用函數(shù)式(9),并決定(對照)將等式的(7)代入式(1)中,可以得到參數(shù)化DLQR成本函數(shù)。所述DLQR控制的主要目的是選擇最小化由下式給出的成本函數(shù)的控制策略。The optimal solution of DLQR, according to 20, admits the following quadratic form DLQR的最佳解決方案,根據(jù)20,得出以下二次型;for some sy
36、mmetric matrix P <n×n > 0. Eqs.(10) and (11) yield the same solutions. Thus, the parameterized functions for the DLQR are classified into parameters A and B of the dynamic system (environment), parameter K of the control law (control policy), and parameters Q and R of the instantaneous co
37、st (utility function). The cost solution is a state quadratic form parameterized by P which is used to represent the cost.對于一些對稱矩陣P <n×n > 0式(10)和(11)產(chǎn)生相同的解決方案。因此,對于DLQR的參數(shù)化函數(shù)分為參數(shù)A和動(dòng)態(tài)系統(tǒng)B(環(huán)境)中,控制法的參數(shù)K(控制策略),和參數(shù)Q和瞬時(shí)成本的R(效用函數(shù))。成本的解決方案是用P參數(shù)的狀態(tài)下二次形式用來表示成本。B. Bellman-DLQR FormulationB.貝爾曼 DLQR方
38、案After algebraic manipulations with Eqs.(10) and (11), Bellman equation (2) for the DLQR is given by之后用公式代數(shù)(10)和(11)操作,Bellman方程(2)為DLQR由下式給出Equation (12) in terms of the feedback gain of the parameterization of Eq.(7) and the dynamics of the closed loop system of Eq.(8) is expressed by等式(12)等式的參數(shù)的反
39、饋增益的參數(shù),和(7)式的閉環(huán)系統(tǒng)的動(dòng)態(tài)特性(8)由下式表示Since Eq.(13) must be satisfied for all states xk, one has a linear equation in P that is given by因?yàn)榈仁剑?3)必須滿足所有狀態(tài)xk,一是具有由下式給出P中的線性方程If the gain K is fixed, Eq.(14) is known as Lyapunov equation. Given a stabilizable gain K, the solution of this equation provides P = P
40、T > 0, such that V K(xk) = xT k Pxk is the cost due to policy K, that is, 如果增益K是固定的,式(14)被稱為Lyapunov方程。給定一個(gè)可穩(wěn)定增益K,該方程的解為P = P T > 0,這樣使V K(xk) = xT k Pxk因政策K時(shí)成本,也就是By writing the Bellman equation (12) as通過寫B(tài)ellman方程(12),為the differentiation with respect to uk is performed to impose a decision
41、policy (control law) uk that minimizes the cost function. Thus, one has that the optimal policy should satisfy執(zhí)行關(guān)于英國分化征收決定政策(控制律),英國的成本函數(shù)最小化。因此,一個(gè)具有最優(yōu)策略應(yīng)該滿足The optimal feedback gain最佳反饋增益Replacing this into Eq.(17), one obtains the discrete time HJB equation or the Bellman optimality equation for th
42、e DLQR parametrization取代這個(gè)代入式(17)中,可以得到的離散時(shí)間HJB方程或貝爾曼最優(yōu)方程為DLQR參數(shù)This equation is also known as discrete algebraic Riccati equation (DARE), also referred to in this text as HJB-Riccati equation.這個(gè)方程也被稱為離散代數(shù)Riccati方程(DARE),也是在這個(gè)文本HJB-Riccati方程提及。IV. HJB APPROACH VIA RLNSHJB控制方法RLS methods has been com
43、monly used for parametric estimation in HDP schemes to approximate the value function of the current control policy, however in order to get better estimates in terms of convergence speed when compared to the standard RLS methods, a loss function based on a exponentially weighted sum of even powers
44、of the estimation error is proposed here. Thus, the problem is characterized and formulated as a parameter estimation problem via RLNS.RLS方法已被普遍用于參數(shù)估計(jì)中HDP計(jì)劃,以接近當(dāng)前的調(diào)控政策的價(jià)值功能,但為了得到收斂速度方面更好的估計(jì)相比,標(biāo)準(zhǔn)RLS方法時(shí),基于一個(gè)指數(shù)損失函數(shù)的估計(jì)誤差甚至功率加權(quán)和這里提出。因此,問題的特點(diǎn)和配制為通過同一固定體位,參數(shù)估計(jì)問題。A. Critic SchemeA. 實(shí)現(xiàn)方案The supervised learni
45、ng can be introduced in Eq.(2) using an iterative scheme which has V approximated by parameterized models V (x, ). In particular, these models have as starting point the parametric structure given by監(jiān)督學(xué)習(xí)可以以等式(2)利用具有V通過參數(shù)化模型近似迭代計(jì)劃推出V (x, )。特別是,這些模型具有如起點(diǎn)由下式給出的參數(shù)結(jié)構(gòu)where (x) = (1(x) 2(x) . . . n(x)T is
46、the basis function vector and = (1 2 . . . n)T is the parameter vector of the approximation. It is noticed that the right side of Eq.(2) is the desired value d(·) for estimation of parameter, that is,其中(x) = (1(x) 2(x) . . . n(x)T是基函數(shù)矢量, = (1 2 . . . n)T是近似的參數(shù)向量。我們注意到,方程(2)的右側(cè)所需的值d(·)為參數(shù)的估
47、計(jì)值,即,The vector should be chosen to minimize the following loss function矢量應(yīng)選擇以盡量減少以下?lián)p失函數(shù)where M, k and j positive integers. Another characteristic of the elements of this set function is that for a fixed value of m, we can determine intervals d, d where the function of these curves have a slope grea
48、ter than the function curve quadratic, this same interval. We can see this feature in Figure 1, where we plot the graphs of the functions (e2+e4+e6), (e2+e4) and e2. where ei is the error between the estimated value V (x, ):其中M, k和j的正整數(shù)。這組功能的元件的另一特征是,對于m為固定值,我們可確定的時(shí)間間隔d, d,其中這些曲線的函數(shù)具有一個(gè)斜率比函數(shù)曲線二次,這相同
49、的時(shí)間間隔大。我們可以在圖1中,看到這個(gè)功能,在我們小區(qū)的功能(e2+e4+e6), (e2+e4)和e2圖表。其中,ei為估計(jì)值V (x, )之間的誤差。Figure 1. Graphs of the functions (e2+e4+e6), (e2+e4) ande2.圖1.函數(shù)(e2+e4+e6), (e2+e4)和e2的曲線圖and the measured value d(·), 0 < 1 the forgetting factor, and M and N are positive integers. The function JN,M is a criteri
50、on to be applied on the error which can provide higher convergence speed with the increasing of the power order into the summation.在測量值d(·)中, 0 < 1是遺忘因子,以及M和N 都是正整數(shù)。功能JN,M是其上能提供更高的收斂速度與功率順序的增大成求和的誤差要應(yīng)用的標(biāo)準(zhǔn)。Assuming that the function J(.)is differentiable, the optimal parameter vector may be f
51、ound through the gradient of J J(.)with respect to , which is given by假定函數(shù)J(.)是可微,最優(yōu)參數(shù)向量可至J(.)的相對于梯度至,它是由下式給出找到where a j = j.kMj e j = 2j 2, where i is the basis function vector of Eq.(21) for the i-th step-time. Thus, equating to zero JN,M so that the function of Eq.(25) reaches the value minimum,
52、define the optimal value of the weight vector, N the equation written in the form matrix,其中a j = j.kMj e j = 2j 2,其中i是式的基函數(shù)矢量式(21)用于第i個(gè)步驟時(shí)間。因此,等同于零JN,M,使得方程(25)的功能達(dá)到最小值,定義權(quán)重向量的最優(yōu)值,N 寫成以下形式的矩陣等式,Isolating the term corresponding to i = N in Eq.(26), one obtain分離等式(26)中i = N的組成變量,可以得到Similarly, Eq.(26)
53、 can be written as follows同樣,方程(26)可以寫成如下We note that the matrix of autocorrelation and cross-correlation vector are now explicit functions of the instantaneous error, different from conventional algorithms in adaptive filtering. This means that we gain more control over learning dynamics, modifying
54、 the shape of the performance surface without affect the final solution, the deductions of Eqs.(27) and (28) were carried out in order to make easier the calculation of matrix inverse, which is necessary according to Eq.(27). Since the matrix N,M is positive definite, the matrix inversion lemma is a
55、pplied to Eq.(27), and after algebraic manipulations and variable replacements in Eqs.(27)-(28), the recursive estimation of is splitted in three equations that are the gain vector(.), the parameter vector(.) and the inverse of the matrix (.), which is denoted by(.). The RLNS gain vector is given by
56、我們注意到,自相關(guān)和互相關(guān)矢量的矩陣現(xiàn)在是瞬時(shí)誤差的明確的功能,從在自適應(yīng)濾波常規(guī)算法不同。這意味著我們獲得了學(xué)習(xí)的動(dòng)力更多的控制,修改業(yè)績面的形狀,而不影響最終的解決方案,根據(jù)式子(27)和(28)中,以使矩陣逆,計(jì)算更容易進(jìn)行了哪些根據(jù)式(27)是必要的。由于矩陣N,M 是正定的,矩陣求逆引理施加到公式(27),并經(jīng)過代數(shù)運(yùn)算和可變的替代在方程(27) - (28)中,的遞歸估計(jì)在三個(gè)方程分裂該是增益矢量(.),參數(shù)向量(.)和所述矩陣(.),它是由(.)表示的逆。在同一固定體位增益矢量由下式給出The inverse of autocorrelation matrix N,M is gi
57、ven by自相關(guān)矩陣N,M的倒數(shù)由下式給出The parameter estimationN is given by參數(shù)估計(jì)N由下式給出Where N,M structure and instantaneous error其中,N,M結(jié)構(gòu)和瞬時(shí)誤差B. Actor SchemeB. 學(xué)習(xí)過程The parameterization of control policy g(x) = g(x, ) by iterative determination of parameters, according to the Bellman optimality equation, and the valu
58、e function estimate V (x, ) allow determination of optimal policy, which is given as調(diào)控策略g(x) = g(x, )通過迭代確定參數(shù)的參數(shù),根據(jù)貝爾曼最優(yōu)方程,值函數(shù)估計(jì)V (x, )允許最優(yōu)的政策,如下式子所示:The optimal solution is obtained by solving the equation of the gradient for that is given by最優(yōu)解是通過求解由給定的梯度為式求出:Clearly, the derivatives of the parameterized models f, c, V and g are required to determine the gradient V . Thus, the optimal of Eq.(33) should satisfy:顯然,參數(shù)化模型f, c, V和g的衍生物需要以確定梯度V 。因此,等式
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 招生暑期策劃方案模板
- 新店策劃推廣方案模板
- 房租抵賬協(xié)議書范本
- 園區(qū)噪聲監(jiān)測方案模板
- 健康心理助你走向成功
- 東西方文化與健康觀比較研究
- 教育高質(zhì)量發(fā)展建言獻(xiàn)策
- 兄弟土地糾紛協(xié)議書范本
- 中國智算中心供配電系統(tǒng)應(yīng)用市場研究報(bào)告2025
- 蔬菜大棚建設(shè)詳細(xì)方案
- 妊娠期合并闌尾炎的護(hù)理
- 2025至2030中國焦化行業(yè)市場發(fā)展分析及發(fā)展趨勢與前景報(bào)告
- 音樂數(shù)據(jù)分析與用戶行為研究-洞察闡釋
- 2025至2030中國電子級磷酸行業(yè)市場發(fā)展分析及市場需求與投資方向報(bào)告
- 電力維修搶險(xiǎn)方案(3篇)
- 民警心理健康課件
- 公路養(yǎng)護(hù)統(tǒng)計(jì)培訓(xùn)
- 2025年河南省中考語文試卷真題(含答案)
- 2025年6月22日四川省市直事業(yè)單位遴選筆試真題及答案解析
- 慶陽市隴東學(xué)院招聘事業(yè)編制筆試真題2024
- 心理學(xué)考試題及答案
評論
0/150
提交評論