Temporal Difference Learning of Position Evaluation in the Game of Go

15 years 8 months ago

Download www.gatsby.ucl.ac.uk

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difﬁcult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via temporal difference (TD) learning. Our approach is based on network architectures that reﬂect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networkstrained by selfplay alone. A network with less than 500 weights learned within 3,000 games of 9x9 Go a position evaluation function that enables a primitive one-ply search to defeat a commercial Go program at a low playing level.

Nicol N. Schraudolph, Peter Dayan, Terrence J. Sej

Real-time Traffic

Conventional Go Programs | Go Program | NIPS 1993 | NIPS 2007 | Position Evaluation |

claim paper

» Temporal Difference Learning Versus CoEvolution for Acquiring Othello Position Evaluation

» Learning Opening Strategy in the Game of Go

» Reinforcement Learning of Local Shape in the Game of Go

» Coevolutionary Temporal Difference Learning for smallboard Go

» Learning to play Tetris applying reinforcement learning methods

» Learning to Play Chess Using Temporal Differences

» Feature Construction for Reinforcement Learning in Hearts

» Why did TDGammon Work

Post Info
More Details (n/a)

Added	02 Nov 2010
Updated	02 Nov 2010
Type	Conference
Year	1993
Where	NIPS
Authors	Nicol N. Schraudolph, Peter Dayan, Terrence J. Sejnowski

Comments (0)

Sciweavers

Temporal Difference Learning of Position Evaluation in the Game of Go

Conventional Go Programs | Go Program | NIPS 1993 | NIPS 2007 | Position Evaluation |

Explore & Download

Productivity Tools

Sciweavers