UCT算法的實現(xiàn),,,.doc_第1頁
UCT算法的實現(xiàn),,,.doc_第2頁
UCT算法的實現(xiàn),,,.doc_第3頁
UCT算法的實現(xiàn),,,.doc_第4頁
UCT算法的實現(xiàn),,,.doc_第5頁
已閱讀5頁,還剩3頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

UCT算法的實現(xiàn)在網(wǎng)上,實在很難找到一篇全面一點的介紹mc和uct的論文,即使是E文的東西,也因為專業(yè)性實在太強,導致很難讀懂(only for me),偶爾看到crazystone 網(wǎng)站上的一點東西,只好免強啃下來,E文水平實在太差,感興趣的朋友只好對照E文啃了!UCTfor Monte Carlo computer goA Monte Carlo (MC) go program plays random games and easily evaluates the terminal position after two passes using Chinese rules. A MC program searches for moves that have high win rates, calculated from playing out at least a few hundred random games.一個mc圍棋程序隨機地下棋,而且根據(jù)中國規(guī)則可以很容易地對雙方結(jié)束后的態(tài)勢進行估值。Mc程序經(jīng)過計算無數(shù)個棋局搜索那個取得最高勝率的走法。A basic MC program would simply collect statistics for all children of the root node, but MC evaluation of these moves will not converge to the best move. It is therefore necessary to do a deeper search.一個最基本的mc程序會簡單地為第一個根節(jié)點統(tǒng)計數(shù)值。但是這些著法的mc估值不會趨向于一個最佳著法,因此需要做更深的搜索。The UCT-method (which stands for Upper Confidence bounds applied to Trees) is a very natural extension to MC-search, where for each played game the first moves are selected by searching a tree which is grown in memory, and as soon as a terminal node is found a new move/child is added to the tree and the rest of the game is played randomly. The evaluation of the finished random game is then used to update the statistics of all moves in the tree that were part of that game.UCT算法(樹圖置信)是對mc搜索自然的擴展,對每一盤棋,通過搜索一個在內(nèi)存中生長的樹來確定最初的著法選擇。當一個枝端節(jié)點出現(xiàn)時,再生長一個新的枝節(jié)點,余下的棋自由走下去。利用棋局最后的估值更新對局樹中所有走法的統(tǒng)計數(shù)據(jù),而這些走法僅是棋局的一部分。The UCT-method solves the problem of selecting moves in the tree such that important moves are searched more often than moves that appears to be bad. One way of doing this would be to select the node with the highest winrate 50% of the time and select a random move otherwise.UCT算法解決了一個問題,就是在樹中選擇走法,重要的走法比那些看起來差點的走法更多地被。一個辦法就是選擇最高勝率高過50的,或者隨機走法。UCT also selects the best move most of the time but also explores other moves in a more sophisticated way. It does this by adding a number to the win rate for each candidate move that goes down every time the node has been visited. But this number also goes up a little every time the parent was visited and some other move is selected. This means that the win rate + the number will grow for unexplored moves so that at some point the sum is higher than for all moves that have higher winrates. If the move works, then the winrate goes up and it may soon be selected again. If the move failed to win then the winrate goes down as well as the added number and the move may have to wait for a long time before the added number has grown large enough. A move can also be selected if all other moves are refuted so that the winrates for all competitors goes down.UCT很 多的時候不僅選擇最好的走法,而且還探索更加通常的一些走法。這樣做是通過對每個被訪問的低勢候選走法的勝率增加一個數(shù)來實現(xiàn)的。但是這個數(shù)每次在父節(jié)點 被訪問或是其它走法被選擇時會同時升高一點。這就是當勝率加上為每個未被找到的走法增長的值,因此在有些點,匯總值要高于那些有較高勝率的點。如果這一步 可行,那么勝率增加,很快被重新選擇,如果這一步導致失敗,那么勝率變小,這個增加值和走法會在增加數(shù)值增長到足夠大之前再等一段時間。一個走法能夠被選 擇,如果所有其它走法被拒絕,因而各方所有的勝率都降底。In summary: starting from the root UCT selects a path of moves through the tree by computing a value for each candidate move based on the winrate and how many times the move has been played as well as how many times the position has been visited. If there are children to a node that has not been visited yet then one of those moves are selected at random. Since this algorithm does not assume any knowledge about go it is natural to explore every move at least once.總之,從根節(jié)點開始,UCT通過為每個基于勝率和走棋次數(shù)(即這個節(jié)點被訪問的次數(shù))的候選走法去計算一個值用來從樹中選擇一個走法路徑。如果有些子節(jié)點還未被訪問,那么這些節(jié)點會被隨機選擇。故而這個算法不需預設(shè)的任何圍棋知識,它自然能夠至少一次遍歷到每種走法。The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with勝率和每個候選走法n的數(shù)值通過如下公式計算。UCTValue(parent,n)=winrate+sqrt(ln(parent.visits)/(5*n.nodevisits)and the candidate move n with highest UCTvalue is selected to be played next.具有最高UCTvalue值的候選的走法n被用于下一步棋。With UCT one can strike a good balance between searching the best move so far and exploring alternative moves. I think the innovation of UCT (to me at least) is the logarithm of the number of visits to the parent node for the UCT-Value. This value goes up quickly early but then it becomes flat, but never stops rising. It goes to infinity although very very slowly. All moves will be searched no matter how bad winrates they have, so that UCT-search given close to infinite time and memory will find any winning move no matter how misleading the winrates are initially.利 用UCT,你可以在進一步的搜索最好走法和找到不相上下的走法之間得到一個好的平衡。我想UCT的創(chuàng)新 (至少對我)是就UCT-Value值而言即是父節(jié)點訪問數(shù)量的對數(shù)。這個值早期很快增大,隨即增長平緩,但從不停止增大。它會很慢很慢地趨向無窮大。不 管勝率有多差,所有的走法都會被搜索得到。因此只要時間和內(nèi)存無窮大,UCT搜索將會找到任何可勝的走法,不管最初勝率是如何的難以確定。Implementation details實現(xiàn)的細節(jié)There are at least two ways to implement the tree:至少有兩種方法可以實現(xiàn)這個樹1. Build a tree in memory Pro: Easy to program Con: You quickly run out of memory using this method.1、在內(nèi)存中建立一個樹。優(yōu)點:編程容易,缺點:利用這種方法你很快就會用完所有內(nèi)存。2. Transposition Table: Pro: Identical positions reached with different move orders will not be searched twice Con: Depending on implementation details one might have problems with the correctness of t

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論