RankNet for evaluation functions of the game of Go.

Authors :: Mandai, Yusaku
Kaneko, Tomoyuki
Source :: International Computer Games Association Journal. 2019, Vol. 41 Issue 2, p78-91. 14p.
Publication Year :: 2019
Abstract: In this paper, we present a new algorithm for learning evaluation functions of the game of Go. Recently AlphaGo Zero and AlphaZero have shown that accurate evaluation functions can be constructed by using deep neural networks. Such a training, however, requires an enormous amount of computational resources that are not available for most researchers. One of the next challenges in this domain is constructing accurate evaluation functions with lesser computational resources. To tackle this problem, we apply the RankNet algorithm to training an AlphaGo Zero style unified Policy and Value network in a learning-to-rank fashion. Using the pairwise RankNet training increases the potential number of training examples and alleviates the requirements for the number of game records. Our modified RankNet algorithm trains both value and policy losses and its joint training makes the learning stable. Experimental results showed that neural networks trained by our algorithm showed higher playing strength than other methods, especially when the dataset sizes were relatively limited. [ABSTRACT FROM AUTHOR]