Preprint announcement
Rich Sutton
rich at gte.com
Thu May 3 12:12:19 EDT 1990
How could a connectionist network _plan_ a sequence of actions before
doing them? The follow preprint describes one answer.
---------------
INTEGRATED ARCHITECTURES FOR LEARNING, PLANNING, AND REACTING
BASED ON APPROXIMATING DYNAMIC PROGRAMMING
Richard S. Sutton
GTE Labs
Abstract
This paper extends previous work with Dyna, a class of architectures for
intelligent systems based on approximating dynamic programming methods.
Dyna architectures integrate trial-and-error (reinforcement) learning
and execution-time planning into a single process operating alternately
on the world and on a learned model of the world. In this paper, I
present and show results for two Dyna architectures. The Dyna-PI
architecture is based on dynamic programming's policy iteration method
and can be related to existing AI ideas such as evaluation functions and
universal plans (reactive systems). Using a navigation task, results
are shown for a simple Dyna-PI system which simultaneously learns by
trial and error, learns a world model, and plans optimal routes using
the evolving world model. The Dyna-Q architecture is based on Watkins's
Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less
familiar set of data structures than does Dyna-PI, but is arguably
simpler to implement and use. We show that Dyna-Q architectures are
easy to adapt for use in changing environments.
---------------
This paper will appear in the proceedings of the Seventh International
Conference on Machine Learning, to be held June, 1990.
For copies, send a request with your US MAIL address to: clc2 at gte.com
More information about the Connectionists
mailing list