Stanford Adaptive Network Colloq: RICHARD SUTTON, Dec 4.

Tue Nov 21 10:43:41 EST 1989

        Stanford University Interdisciplinary Colloquium Series:
                 Adaptive Networks and their Applications

                      December 4th (Monday, 3:45pm):

                           Room 380-380C

********************************************************************************
   DYNA: AN INTEGRATED ARCHITECTURE FOR LEARNING, PLANNING, AND REACTING

			     Richard S. Sutton
		       GTE Laboratories Incorporated
********************************************************************************

				 Abstract

    How should a robot decide what to do?  The traditional answer in AI has
been that it should deduce its best  action in  light of  its current goals
and world model, i.e., that it  should _plan_.  However, it  is  now widely
recognized that planning's computational complexity makes it infeasible for
rapid decision making and that  its  dependence  on a complete and accurate
world model also greatly limits its applicability.  An alternative is to do
the planning in advance and compile it into a set of  rapid _reactions_, or
situation-action rules, which are  then used for real-time decision making.
Yet a third  approach is to _learn_ a  good set of reactions  by trial  and
error; this has the advantage that it eliminates all dependence on  a world
model.  In this talk I  present  _Dyna_,  a simple architecture integrating
and permitting tradeoffs among all three approaches.

    Dyna is based  on the  old idea  that planning is  like trial-and-error
learning from hypothetical experience.  The theory of  Dyna is based on the
classical optimization  technique of  _dynamic_programming_, and on dynamic
programming's      relationship   to    reinforcement    learning,       to
temporal-difference learning,  and to AI  methods for planning  and search.
In this talk,  I summarize Dyna theory  and present Dyna systems that learn
from trial and error while they simultaneously learn a world model  and use
it to plan   optimal action sequences.    This work is an  integration  and
extension of prior work by Barto, Watkins, and Whitehead.

===========================================================================
                            GENERAL INFO:

Location: Room 380-380C, which can be reached through the lower level
 between the Psychology and Mathematical Sciences buildings. 
Level: Technically oriented for persons working in related areas.
Mailing lists: To be added to the network mailing list, netmail to
 netlist at psych.stanford.edu with "addme" as your subject header.

Additional information: Contact Mark Gluck (gluck at psych.stanford.edu).