Thesis on neuroprose archives

Tue Sep 13 08:44:40 EDT 1994

ftp://archive.cis.ohio-state.edu/pub/neuroprose/thesis/bradtke.thesis.ps.Z

FTP-host: archive.cis.ohio-state.edu
FTP-file: pub/neuroprose/thesis/bradtke.thesis.Z

		Incremental Dynamic Programming for 
		 On-Line Adaptive Optimal Control

			   (133 pages)

		   CMPSCI Technical Report 94-62

		        Steven J. Bradtke
		    Computer Science Department
		    University of Massachusetts
		        Amherst, MA 01003

		       bradtke at cs.umass.edu

			     Abstract

	Reinforcement learning algorithms based on the principles of
Dynamic Programming (DP) have enjoyed a great deal of recent attention both
empirically and theoretically.  These algorithms have been referred to
generically as Incremental Dynamic Programming (IDP) algorithms.  IDP
algorithms are intended for use in situations where the information or
computational resources needed by traditional dynamic programming
algorithms are not available.  IDP algorithms attempt to find a global
solution to a DP problem by incrementally improving local constraint
satisfaction properties as experience is gained through interaction with
the environment.  This class of algorithms is not new, going back at least
as far as Samuel's adaptive checkers-playing programs, but the links to DP
have only been noted and understood very recently.

	This dissertation expands the theoretical and empirical
understanding of IDP algorithms and increases their domain of practical
application.  We address a number of issues concerning the use of IDP
algorithms for on-line adaptive optimal control.  We present a new
algorithm, Real-Time Dynamic Programming, that generalizes Korf's Learning
Real-Time A* to a stochastic domain, and show that it has computational
advantages over conventional DP approaches to such problems.  We then
describe several new IDP algorithms based on the theory of Least Squares
function approximation.  Finally, we begin the extension of IDP theory to
continuous domains by considering the problem of Linear Quadratic
Regulation.  We present an algorithm based on Policy Iteration and Watkins'
Q-functions and prove convergence of the algorithm (under the appropriate
conditions) to the optimal policy.  This is the first result proving
convergence of a DP-based reinforcement learning algorithm to the optimal
policy for any continuous domain.  We also demonstrate that IDP algorithms
cannot be applied blindly to problems from continous domains, even such
simple domains as Linear Quadratic Regulation.

Instructions for ftp retrieval of this paper are given below.  
Please do not reply directly to this message.

FTP INSTRUCTIONS:

unix> ftp archive.cis.ohio-state.edu (or 128.146.8.52)
    Name: anonymous
    Password: <your e-mail address>
    ftp> cd pub/neuroprose/thesis
    ftp> binary
    ftp> get bradtke.thesis.Z
    ftp> quit
unix> uncompress bradtke.thesis.Z

Thanks to Jordan Pollack for maintaining this archive. 

Steve Bradtke

=======================================================================
Steve Bradtke	(813) 978-6285		GTE Data Services
					DC F4M
Internet:				One E. Telecom Parkway
bradtke@[138.83.42.66]@gte.com		Temple Terrace, FL 33637
bradtke at cs.umass.edu					
=======================================================================