2 papers on Hierarchical Reinforcement Learning

Thomas G. Dietterich tgd at cs.orst.edu
Wed May 26 10:38:57 EDT 1999


The following two papers are available from the Computing Research
Repository (CoRR) (http://xxx.lanl.gov/archive/cs/intro.html or its
mirror sites).  They can also be retrieved from the Reinforcement
Learning Repository http://web.cps.msu.edu/rlr/ or from my home page:
http://www.cs.orst.edu/~tgd/cv/pubs.html.

Number: cs.LG/9905014
Title: Hierarchical Reinforcement Learning with the MAXQ Value
       Function Decomposition 
Authors: Thomas G. Dietterich
Comments: 63 pages, 15 figures
Subj-class: Learning
ACM-class: I.2.6 

This paper presents the MAXQ approach to hierarchical reinforcement
learning based on decomposing the target Markov decision process (MDP)
into a hierarchy of smaller MDPs and decomposing the value function of
the target MDP into an additive combination of the value functions of
the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal
results on its representational power, and establishes five conditions
for the safe use of state abstractions. The paper presents an online
model-free learning algorithm, MAXQ-Q, and proves that it converges
wih probability 1 to a kind of locally-optimal policy known as a
recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and
MAXQ-Q through a series of experiments in three domains and shows
experimentally that MAXQ-Q (with state abstractions) converges to a
recursively optimal policy much faster than flat Q learning. The fact
that MAXQ learns a representation of the value function has an
important benefit: it makes it possible to compute and execute an
improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates
the effectiveness of this non-hierarchical execution
experimentally. Finally, the paper concludes with a comparison to
related work and a discussion of the design tradeoffs in hierarchical
reinforcement learning.  (168kb)

Number: cs.LG/9905015
Title: State Abstraction in MAXQ Hierarchical Reinforcement Learning
Authors: Thomas G. Dietterich
Comments: 7 pages, 2 figures
Subj-class: Learning
ACM-class: I.2.6 

Many researchers have explored methods for hierarchical reinforcement
learning (RL) with temporal abstractions, in which abstract actions
are defined that can perform many primitive actions before
terminating. However, little is known about learning with state
abstractions, in which aspects of the state space are ignored. In
previous work, we developed the MAXQ method for hierarchical RL. In
this paper, we define five conditions under which state abstraction
can be combined with the MAXQ value function decomposition. We prove
that the MAXQ-Q learning algorithm converges under these conditions
and show experimentally that state abstraction is important for the
successful application of MAXQ-Q learning.  (37kb)




More information about the Connectionists mailing list