JAIR: MAXQ Hierarchical Reinforcement Learning

Thomas G. Dietterich tgd at cs.orst.edu
Sun Dec 3 23:01:06 EST 2000


JAIR is pleased to announce the publication of the following article:

Dietterich, T.G. (2000)
  "Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition", 
   Volume 13, pages 227-303.

   Available in PDF, PostScript and compressed PostScript.
   For quick access via your WWW browser, use this URL:
     http://www.jair.org/abstracts/dietterich00a.html
   More detailed instructions are below.

   Abstract: This paper presents a new approach to hierarchical
   reinforcement learning based on decomposing the target Markov decision
   process (MDP) into a hierarchy of smaller MDPs and decomposing the
   value function of the target MDP into an additive combination of the
   value functions of the smaller MDPs.  The decomposition, known as the
   MAXQ decomposition, has both a procedural semantics---as a subroutine
   hierarchy---and a declarative semantics---as a representation of the
   value function of a hierarchical policy.  MAXQ unifies and extends
   previous work on hierarchical reinforcement learning by Singh,
   Kaelbling, and Dayan and Hinton.  It is based on the assumption that
   the programmer can identify useful subgoals and define subtasks that
   achieve these subgoals.  By defining such subgoals, the programmer
   constrains the set of policies that need to be considered during
   reinforcement learning.  The MAXQ value function decomposition can
   represent the value function of any policy that is consistent with the
   given hierarchy.  The decomposition also creates opportunities to
   exploit state abstractions, so that individual MDPs within the
   hierarchy can ignore large parts of the state space.  This is
   important for the practical application of the method.  This paper
   defines the MAXQ hierarchy, proves formal results on its
   representational power, and establishes five conditions for the safe
   use of state abstractions.  The paper presents an online model-free
   learning algorithm, MAXQ-Q, and proves that it converges with
   probability 1 to a kind of locally-optimal policy known as a
   recursively optimal policy, even in the presence of the five kinds of
   state abstraction.  The paper evaluates the MAXQ representation and
   MAXQ-Q through a series of experiments in three domains and shows
   experimentally that MAXQ-Q (with state abstractions) converges to a
   recursively optimal policy much faster than flat Q learning.  The fact
   that MAXQ learns a representation of the value function has an
   important benefit: it makes it possible to compute and execute an
   improved, non-hierarchical policy via a procedure similar to the
   policy improvement step of policy iteration.  The paper demonstrates
   the effectiveness of this non-hierarchical execution experimentally.
   Finally, the paper concludes with a comparison to related work and a
   discussion of the design tradeoffs in hierarchical reinforcement learning.

The article is available via:
   
 -- comp.ai.jair.papers (also see comp.ai.jair.announce)

 -- World Wide Web: The URL for our World Wide Web server is
       http://www.jair.org/
    For direct access to this article and related files try:
       http://www.jair.org/abstracts/dietterich00a.html

 -- Anonymous FTP from either of the two sites below.

    Carnegie-Mellon University (USA):
	ftp://ftp.cs.cmu.edu/project/jair/volume13/dietterich00a.ps
    The University of Genoa (Italy):
	ftp://ftp.mrg.dist.unige.it/pub/jair/pub/volume13/dietterich00a.ps

    The compressed PostScript file is named dietterich00a.ps.Z (464K)

For more information about JAIR, visit our WWW or FTP sites, or
contact jair-ed at isi.edu








More information about the Connectionists mailing list