A tech report on transfer of solutions across multiple RL tasks

Fri Apr 9 17:33:18 EDT 1999

Anouncing a technical report related to solving multiple RL tasks:

http://www-anw.cs.umass.edu/~bern/publications/reuse_tech.ps
--------------------------------------------------------------------------
                       Daniel S. Bernstein
                      Adaptive Networks Lab
                  Department of Computer Science
               University of Massachusetts, Amherst
                          TR-1999-26
                          April, 1999

We consider the reuse of policies for previous MDPs in learning on a
new MDP, under the assumption that the vector of parameters of each
MDP is drawn from a fixed probability distribution.  We use the
options framework, in which an option consists of a set of initiation
states, a policy, and a termination condition.  We use an option
called a \emph{reuse option}, for which the set of initiation states
is the set of all states, the policy is a combination of policies from
the old MDPs, and the termination condition is based on the number of
time steps since the option was initiated.  Given policies for $m$ of
the MDPs from the distribution, we construct reuse options from the
policies and compare performance on an $m+1$st MDP both with and
without various reuse options.  We find that reuse options can speed
initial learning of the $m+1$st task.  We also present a distribution
of MDPs for which reuse options can slow initial learning.  We discuss
reasons for this and suggest other ways to design reuse options.

Keywords: reinforcement learning, Markov decision processes,
options, learning to learn
----------------------------------------------------------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniel S. Bernstein                 URL: http://www-anw.cs.umass.edu/~bern
Department of Computer Science      EMAIL: bern at cs.umass.edu
University of Massachusetts         PHONE: (413)545-1596 [office]
Amherst, MA 01003
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~