Technical Report Available

Andy Barto barto at envy.cs.umass.edu
Mon Sep 9 13:17:05 EDT 1991


The following technical report is available:


                 Real-Time Learning and Control
            using Asynchronous Dynamic Programming

     Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh
		
                 Department of Computer Science
        University  of  Massachusetts, Amherst MA 01003

                    Technical  Report   91-57

Abstract---Learning methods based on dynamic programming (DP) are 
receiving increasing attention in artificial intelligence. Researchers 
have argued that DP provides the appropriate basis for compiling planning
results into reactive strategies for real-time control, as well as for
learning such strategies when the system being controlled is incompletely
known. We extend the existing theory of DP-based learning algorithms by
bringing to bear on their analysis a collection of relevant mathematical
results from the theory of asynchronous DP.  We present convergence
results for a class of DP-based algorithms for real-time learning and
control which generalizes Korf's Learning-Real-Time-A* (LRTA*) algorithm
to problems involving uncertainty.  We also discuss Watkins' Q-Learning
algorithm in light of asynchronous DP, as well as some of the methods
included in Sutton's Dyna architecture.  We provide an account that 
is more complete than currently available of what is formally known, and
what is not formally known, about the behavior of DP-based learning
algorithms. A secondary aim is to provide a bridge between AI research on
real-time planning and learning and relevant concepts and algorithms from
control theory.

--------------------------------------------------------------------

This TR has been placed in the Neuroprose  directory (courtesy Jordan
Pollack) in compressed form. The file name is "barto.realtime-dp.ps.Z".
The instructions for retreiving this document from that archive are 
given below. WARNING:  This paper is  SIXTY  EIGHT  pages long. 

If you are unable to retreive/print it and therefore wish to receive a
hardcopy please send mail to the following address:

Connie Smith
Department of Computer Science
University of Massachusetts
Amherst, MA 01003

Smith at cs.umass.edu

****PLEASE DO NOT REPLY TO THIS MESSAGE*****

NOTE: This is the paper on which my talk at the Machine Learning
Workshop, July 1991, was based. If you requested a copy at that time,
it is already in the mail.

Thanks,
Andy Barto
--------------------------------------------------------------------

Here is how to ftp this paper:

unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get barto.realtime-dp.ps.Z
ftp> quit
unix> uncompress barto.realtime-dp.ps.Z
unix> lpr barto.realtimedp.ps








More information about the Connectionists mailing list