TR announcement

Fri Apr 23 17:05:34 EDT 1999

Dear Colleagues,

The following technical report is available at

http://victoria.mindmaker.hu/~szepes/papers/macro-tr99-01.ps.gz

All comments are welcome.

 Best wishes,

  Csaba Szepesvari

----------------------------------------------------------------
An Evaluation Criterion for Macro Learning and Some Results

Zs. Kalmar and Cs. Szepesvari
TR99-01, Mindmaker Ltd., Budapest 1121, Konkoly Th. M. u. 29-33

It is known that a well-chosen set of macros makes it possible to
considerably speed-up the solution of planning problems.  Recently, macros
have been considered in the planning framework, built on Markovian
decision problem. However, so far no systematic approach was put forth to
investigate the utility of macros within this framework. In this article
we begin to systematically study this problem by introducing the concept
of multi-task MDPs defined with a distribution over the tasks. We propose
an evaluation criterion for macro-sets that is based on the expected
planning speed-up due to the usage of a macro-set, where the expectation
is taken over the set of tasks. The consistency of the empirical speed-up
maximization algorithm is shown in the finite case. For acyclic systems,
the expected planning speed-up is shown to be proportional to the amount
of ``time-compression'' due to the macros. Based on these observations a
heuristic algorithm for learning of macros is proposed. The algorithm is
shown to return macros identical with those that one would like to design
by hand in the case of a particular navigation like multi-task MDP. Some
related questions, in particular the problem of breaking up MDPs into
multiple tasks, factorizing MDPs and learning generalizations over actions
to enhance the amount of transfer are also considered in brief at the end
of the paper.

Keywords:
Reinforcement learning, MDPs, planning, macros, empirical speed-up
optimization