NIPS preprint available

Mon Jan 17 23:53:20 EST 1994

Temporal Difference Learning of Position Evaluation in the Game of Go
---------------------------------------------------------------------

    Nicol N. Schraudolph    Peter Dayan    Terrence J. Sejnowski

              Computational Neurobiology Laboratory
            The Salk Institute for Biological Studies
                     San Diego, CA 92186-5800

Abstract:

The game of Go has a high branching factor that defeats the tree search
approach used in computer chess, and long-range spatiotemporal inter-
actions that make position evaluation extremely difficult.  Development
of conventional Go programs is hampered by their knowledge-intensive
nature.  We demonstrate a viable alternative by training networks to
evaluate Go positions via temporal difference (TD) learning.

Our approach is based on network architectures that reflect the spatial
organization of both input and reinforcement signals on the Go board,
and training protocols that provide exposure to competent (though un-
labelled) play.  These techniques yield far better performance than
undifferentiated networks trained by self-play alone.  A network with
less than 500 weights learned within 3,000 games of 9x9 Go a position
evaluation function that enables a primitive one-ply search to defeat
a commercial Go program at a low playing level.

--------

A preprint of the above paper is available by anonymous ftp from salk.edu
(192.31.153.101), file pub/schraudo/nips93.ps.Z.  (If you do not have ftp
access to the Internet, send the message "help" to ftpmail at decwrl.dec.com
for information on ftp-by-email service.)