one more
Juergen Schmidhuber
juergen at idsia.ch
Wed Jun 21 04:09:40 EDT 1995
http://www.idsia.ch/reports.html
FTP-host: fava.idsia.ch (192.132.252.1)
FTP-filename: /pub/papers/idsia59-95.ps.gz (12 pages, 69k)
ENVIRONMENT-INDEPENDENT REINFORCEMENT ACCELERATION
Technical Note IDSIA-59-95
Write-up of invited talk at Hongkong Univ. ST (May 29, 1995)
Juergen Schmidhuber, IDSIA
A reinforcement learning system with limited computational
resources interacts with an unrestricted, unknown environment.
Its goal is to maximize cumulative reward, to be obtained
throughout its limited, unknown lifetime. System policy is an
arbitrary modifiable algorithm mapping environmental inputs
and internal states to outputs and new internal states. The
problem is: in realistic, unknown environments, each policy
modification process (PMP) occurring during system life may
have unpredictable influence on environmental states, rewards
and PMPs at any later time. Existing reinforcement learning
algorithms cannot properly deal with this. Neither can naive
exhaustive search among all policy candidates -- not even in
case of very small search spaces. In fact, a reasonable way
of measuring performance improvements in such general (but
typical) situations is missing. I define such a measure based
on the novel ``reinforcement acceleration criterion'' (RAC).
RAC is satisfied if the beginning of each completed PMP that
computed a currently valid policy modification has been followed
by faster average reinforcement intake than system start-up and
the beginnings of all previous such PMPs (the computation time
for PMPs is taken into account). Then I present a method called
``environment-independent reinforcement acceleration'' (EIRA)
which is guaranteed to achieve RAC. EIRA does neither care
whether the system's policy allows for changing itself, nor
whether there are multiple, interacting learning systems.
Consequences are: (1) a sound theoretical framework for ``meta-
learning'' (because the success of a PMP recursively depends on
the success of all later PMPs, for which it is setting the stage).
(2) A sound theoretical framework for multi-agent learning. The
principles have been implemented (1) in a single system using an
assembler-like programming language to modify its own policy,
and (2) a system consisting of multiple agents, where each agent
is in fact just a connection in a fully recurrent reinforcement
learning neural net. A by-product of this research is a general
reinforcement learning algorithm for such nets. Preliminary
experiments illustrate the theory.
Juergen Schmidhuber
IDSIA, Corso Elvezia 36
6900-Lugano, Switzerland
juergen at idsia.ch
http://www.idsia.ch
More information about the Connectionists
mailing list