NIPS preprint available
Nici Schraudolph
schraudo at salk.edu
Mon Jan 17 23:53:20 EST 1994
Temporal Difference Learning of Position Evaluation in the Game of Go
---------------------------------------------------------------------
Nicol N. Schraudolph Peter Dayan Terrence J. Sejnowski
Computational Neurobiology Laboratory
The Salk Institute for Biological Studies
San Diego, CA 92186-5800
Abstract:
The game of Go has a high branching factor that defeats the tree search
approach used in computer chess, and long-range spatiotemporal inter-
actions that make position evaluation extremely difficult. Development
of conventional Go programs is hampered by their knowledge-intensive
nature. We demonstrate a viable alternative by training networks to
evaluate Go positions via temporal difference (TD) learning.
Our approach is based on network architectures that reflect the spatial
organization of both input and reinforcement signals on the Go board,
and training protocols that provide exposure to competent (though un-
labelled) play. These techniques yield far better performance than
undifferentiated networks trained by self-play alone. A network with
less than 500 weights learned within 3,000 games of 9x9 Go a position
evaluation function that enables a primitive one-ply search to defeat
a commercial Go program at a low playing level.
--------
A preprint of the above paper is available by anonymous ftp from salk.edu
(192.31.153.101), file pub/schraudo/nips93.ps.Z. (If you do not have ftp
access to the Internet, send the message "help" to ftpmail at decwrl.dec.com
for information on ftp-by-email service.)
More information about the Connectionists
mailing list