paper placed in neuroprose

bradtke@envy.cs.umass.edu bradtke at envy.cs.umass.edu
Thu Jan 28 11:20:59 EST 1993


	The paper "Learning to Act using Real-Time Dynamic Programming" has
been placed in the Neuroprose Archives.  It is a revised version of the
COINS TR 91-57 "Real-time learning and control using asynchronous dynamic
programming" and has been submitted to the AI Journal special issue on
Computational Theories of Interaction and Agency.  The new version has
replaced the old version in the archives.  The presentation has been
cleaned up throughout, several errors have been corrected, and the
experiments greatly expanded.  Note that this new version uses a somewhat
different experimental problem definition than the old version.


      -----------------------------------------------------

	     Learning to Act using Real-Time Dynamic Programming

            Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh
                     Department of Computer Science
               University of Massachusetts, Amherst MA 01003


Learning methods based on dynamic programming (DP) are receiving increasing
attention in artificial intelligence. Researchers have argued that DP
provides the appropriate basis for compiling planning results into reactive
strategies for real-time control, as well as for learning such strategies
when the system being controlled is incompletely known. We introduce an
algorithm based on DP, which we call Real-Time DP (RTDP), by which an
embedded system can improve its performance with experience.  RTDP
generalizes Korf's Learning-Real-Time-A* algorithm to problems involving
uncertainty. We invoke results from the theory of asynchronous DP to prove
that RTDP achieves optimal behavior in several different classes of
problems.  We also use the theory of asynchronous DP to illuminate aspects
of other DP-based reinforcement learning methods such as Watkins'
Q-Learning algorithm.  A secondary aim of this article is to provide a
bridge between AI research on real-time planning and learning and relevant
concepts and algorithms from control theory.


      -----------------------------------------------------
                        FTP INSTRUCTIONS

Either use "Getps barto.realtime-dp.ps.Z", or do the following:

     unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get barto.realtime-dp.ps.Z
     ftp> quit
     unix> uncompress barto.realtime-dp.ps
     unix> lpr -s barto.realtime-dp.ps (or however you print postscript)



Steve Bradtke


More information about the Connectionists mailing list