Hierarchical Reinforcement Learning with MAXQ

Tom Dietterich tgd at CS.ORST.EDU
Mon Jul 6 16:57:16 EDT 1998


Two papers on hierarchical reinforcement learning are available.  One
is a conference paper that will appear this summer at the
International Conference on Machine Learning.  The other is a
journal-length version with full technical details and discussion. 

======================================================================
Dietterich, T. G. (to appear). The MAXQ method for hierarchical
reinforcement learning.  1998 International Conference on Machine Learning.
URL: ftp://ftp.cs.orst.edu/pub/tgd/papers/ml98-maxq.ps.gz

Abstract: This paper presents a new approach to hierarchical
reinforcement learning based on the MAXQ decomposition of the value
function.  The MAXQ decomposition has both a procedural semantics---as
a subroutine hierarchy---and a declarative semantics---as a
representation of the value function of a hierarchical policy.  MAXQ
unifies and extends previous work on hierarchical reinforcement
learning by Singh, Kaelbling, and Dayan and Hinton.  Conditions under
which the MAXQ decomposition can represent the optimal value function
are derived.  The paper defines a hierarchical Q learning algorithm,
proves its convergence, and shows experimentally that it can learn
much faster than ordinary ``flat'' Q learning.  Finally, the paper
discusses some interesting issues that arise in hierarchical
reinforcement learning including the hierarchical credit assignment
problem and non-hierarchical execution of the MAXQ hierarchy.

Note: This version has some errors corrected compared to the version
that appears in the proceedings.  In particular, Figure 1 is fixed. 
======================================================================

Dietterich, T. G. (Submitted). Hierarchical reinforcement learning
with the MAXQ value function decomposition

URL: ftp://ftp.cs.orst.edu/pub/tgd/papers/mlj-maxq.ps.gz

Abstract: This paper presents a new approach to hierarchical
reinforcement learning based on the MAXQ decomposition of the value
function.  The MAXQ decomposition has both a procedural semantics---as
a subroutine hierarchy---and a declarative semantics---as a
representation of the value function of a hierarchical policy.  MAXQ
unifies and extends previous work on hierarchical reinforcement
learning by Singh, Kaelbling, and Dayan and Hinton.  Conditions under
which the MAXQ decomposition can represent the optimal value function
are derived.  The paper defines a hierarchical Q learning algorithm,
proves its convergence, and shows experimentally that it can learn
much faster than ordinary ``flat'' Q learning.  These results and
experiments are extended to support state abstraction and
non-hierarchical execution.  The paper concludes with a discussion of
design tradeoffs in hierarchical reinforcement learning.

======================================================================



More information about the Connectionists mailing list