temporal abstraction

Wed May 1 10:49:29 EDT 2002

For autonomously creating temporal abstractions
(open-loop or closed-loop policies), see also:

R. Sun and C. Sessions, " Self-segmentation of sequences: automatic
formation of hierarchies of sequential behaviors. " IEEE Transactions on
Systems, Man, and Cybernetics: Part B Cybernetics, Vol.30, No.3,
pp.403-418. 2000. 
http://www.cecs.missouri.edu/~rsun/sun.smc00.ps
http://www.cecs.missouri.edu/~rsun/sun.smc00.pdf

The paper presents an approach for hierarchical reinforcement learning that does 
not rely on a priori domain-specific knowledge regarding hierarchical structures.
It involves learning to segment action sequences to create hierarchical
structures  (for example, for the purpose of dealing with partially observable
Markov decision processes, with multiple limited-memory or memoryless modules).
Segmentation is based on reinforcement received during task execution,
with different levels of control communicating with each other
through sharing reinforcement estimates obtained by each other.
The algorithm segments action sequences to reduce non-Markovian temporal 
dependencies, and seeks out proper configurations of long-  and short-range
dependencies, to facilitate the learning of the overall task.

R. Sun and C. Sessions, "Learning plans without a priori knowledge."
Adaptive Behavior, Vol.8, No.3/4, pp.225-253. 2000. 
(The paper has just appeared.  The publication of journal was significantly
delayed.)
http://www.cecs.missouri.edu/~rsun/sun.ab00.ps

This paper is concerned with the autonomous learning of plans in probabilistic 
domains without a priori domain-specific knowledge. In contrast to existing 
reinforcement learning algorithms that generate only reactive plans, and existing
probabilistic planning algorithms that require a substantial amount of a priori 
knowledge in order to plan, a two-stage bottom-up process is devised in which 
first reinforcement learning/dynamic programming is applied, without the use of 
a priori domain-specific knowledge, to acquire a reactive plan, and then explicit
plans are extracted from the reactive plan.  Several options for plan extraction 
are examined, each of which is based on a beam search that performs temporal 
projection in a restricted fashion, guided by the value functions
resulting from reinforcement learning/dynamic programming.  Some completeness and
soundness results are given.  Examples in several domains are discussed that 
together demonstrate the working of the proposed model.

Or go through my Web page at

http://www.cecs.missouri.edu/~rsun

Cheers,
---Ron