A Trivial but Fast Reinforcement Controller
John Moody
moody at chianti.cse.ogi.edu
Tue May 24 19:46:28 EDT 1994
The following paper is available via anonymous ftp:
=========================================================================
File: moodyTresp94.reinforce.ps.Z
To appear in Neural Computation, vol. 6, 1994.
-------------------------------------------------------------------------
A Trivial but Fast Reinforcement Controller
John Moody and Volker Tresp
Abstract:
We compare simulation results for the classic Barto-Sutton-Anderson
pole balancer (which uses the Michie and Chambers ``boxes''
representation) with results for a reinforcement learning controller
which employs a quadratic representation for both the adaptive
critic element (ACE) and the associative search element (ASE). We
find that this simple controller learns to balance the pole after
a median of only 2 failures. This corresponds to a relative speed-up
factor of over 7000 in simulated physical time. Moreover, the
quality of the control, as measured by the residual kinetic energy
of the cart/pole system after learning, is substantially better
for the quadratic ACE/ASE controller.
=========================================================================
Retrieval instructions are:
unix> ftp neural.cse.ogi.edu
login: anonymous
password: name at email.address
ftp> cd pub/neural
ftp> cd papers
ftp> get INDEX
ftp> binary
ftp> get moodyTresp94.reinforce.ps.Z
ftp> quit
unix> uncompress *.Z
unix> lpr *.ps
More information about the Connectionists
mailing list