A Trivial but Fast Reinforcement Controller

Tue May 24 19:46:28 EDT 1994

The following paper is available via anonymous ftp:

=========================================================================
File: moodyTresp94.reinforce.ps.Z

To appear in Neural Computation, vol. 6, 1994.
-------------------------------------------------------------------------

A Trivial but Fast Reinforcement Controller

John Moody and Volker Tresp

Abstract:

We compare simulation results for the classic Barto-Sutton-Anderson
pole balancer (which uses the Michie and Chambers ``boxes''
representation) with results for a reinforcement learning controller
which employs a quadratic representation for both the adaptive
critic element (ACE) and the associative search element (ASE).  We
find that this simple controller learns to balance the pole after
a median of only 2 failures.  This corresponds to a relative speed-up
factor of over 7000 in simulated physical time.  Moreover, the
quality of the control, as measured by the residual kinetic energy
of the cart/pole system after learning, is substantially better
for the quadratic ACE/ASE controller.

=========================================================================

Retrieval instructions are:

unix> ftp neural.cse.ogi.edu
login: anonymous
password: name at email.address

ftp> cd pub/neural
ftp> cd papers
ftp> get INDEX
ftp> binary
ftp> get moodyTresp94.reinforce.ps.Z
ftp> quit

unix> uncompress *.Z
unix> lpr *.ps