Tech report available

Tommi Jaakkola tommi at psyche.mit.edu
Wed Jul 21 18:26:38 EDT 1993



THe following paper is now available on the neuroprose
archive as "jaakkola.convergence.ps.Z".


	On the Convergence of Stochastic Iterative Dynamic 
		    Programming Algorithms

		          Tommi Jaakkola
		        Michael I. Jordan
	    Department of Brain and Cognitive Sciences
                Massachusetts Institute of Technology

	   	         Satinder P. Singh
	           Department of Computer Science
               University of Massachusetts at Amherst

   Recent developments in the area of reinforcement learning have
   yielded a number of new algorithms for the prediction and control
   of Markovian environments.  These algorithms, including the
   TD($\lambda$) algorithm of Sutton (1988) and the Q-learning
   algorithm of Watkins (1989), can be motivated heuristically as 
   approximations to dynamic programming (DP).  In this paper we 
   provide a rigorous proof of convergence of these DP-based learning 
   algorithms by relating them to the powerful techniques of stochastic 
   approximation theory via a new convergence theorem.  The theorem 
   establishes a general class of convergent algorithms to which 
   both TD($\lambda$) and Q-learning belong.



More information about the Connectionists mailing list