Technical Report Announcement: Reinforcement Learning with Temporal Abstraction

Mon Jul 20 11:54:13 EDT 1998

We are pleased to announce the public availability of the following
technical report:

                       Between MDPs and semi-MDPs:
Learning, planning, and representing knowledge at multiple temporal scales.
          by Richard S. Sutton, Doina Precup, and Satinder Singh

      Learning, planning, and representing knowledge at multiple levels
      of temporal abstraction are key challenges for Artificial Intelligence.
      this paper we develop an approach to these problems based on the
      mathematical framework of reinforcement learning and Markov
      decision processes (MDPs). We extend the usual notion of action to
      include {\it options}---whole courses of behavior that may be temporally
      extended, stochastic, and contingent on events. Examples of options
      include picking up an object, going to lunch, and traveling to a distant
      city, as well as primitive actions such as muscle twitches and joint
      torques. Options may be given a priori, learned by experience, or
      both. They may be used interchangeably with actions in a variety of
      planning and learning methods. The theory of semi-Markov decision
      processes (SMDPs) can be applied to model the consequences of
      options and as a basis for planning and learning methods using them.
      In this paper we develop these connections, building on prior work by
      Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel
      results concern the interface between the MDP and SMDP levels of
      analysis. We show how a set of options can be altered by changing
      only their termination conditions to improve over SMDP methods
      with no additional cost. We also introduce {\it intra-option}
      temporal-difference methods that are able to learn from fragments of
      an option's execution. Finally, we propose a notion of subgoal which
      can be used to improve the options themselves. Overall, we argue that
      options and their models provide hitherto missing aspects of a
      powerful, clear, and expressive framework for representing and
      organizing knowledge.

ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/SPS-98.ps.gz
39 pages, 1.8 MBytes.