one more

Wed Jun 21 04:09:40 EDT 1995

http://www.idsia.ch/reports.html 
FTP-host: fava.idsia.ch (192.132.252.1)
FTP-filename: /pub/papers/idsia59-95.ps.gz  (12 pages, 69k)

     ENVIRONMENT-INDEPENDENT REINFORCEMENT ACCELERATION 
                 Technical Note IDSIA-59-95 
 Write-up of invited talk at Hongkong Univ. ST (May 29, 1995)
                 Juergen Schmidhuber, IDSIA 

A reinforcement learning system with limited computational 
resources interacts with an unrestricted, unknown environment. 
Its goal is to maximize cumulative reward, to be obtained 
throughout its limited, unknown lifetime. System policy is an 
arbitrary  modifiable algorithm mapping environmental inputs 
and internal states to outputs and new internal states. The 
problem is: in realistic, unknown environments, each policy 
modification process (PMP) occurring during system life may 
have unpredictable influence on environmental states, rewards 
and PMPs at any later time. Existing reinforcement learning 
algorithms cannot properly deal with this. Neither can naive 
exhaustive search among all policy candidates -- not even in 
case of very small search spaces. In fact, a reasonable way 
of measuring performance improvements in such general (but 
typical) situations is missing. I define such a measure based 
on the novel ``reinforcement acceleration criterion'' (RAC). 
RAC is satisfied if the beginning of each completed PMP that 
computed a currently valid policy modification has been followed 
by faster average reinforcement intake than system start-up and 
the beginnings of all previous such  PMPs (the computation time 
for PMPs is taken into account). Then I present a method called 
``environment-independent reinforcement acceleration'' (EIRA) 
which is guaranteed to achieve RAC.  EIRA does neither care 
whether the system's policy allows for changing itself, nor 
whether there are multiple, interacting learning systems. 
Consequences are: (1) a sound theoretical framework for ``meta-
learning'' (because the success of a PMP recursively depends on 
the success of all later PMPs, for which it is setting the stage). 
(2) A sound theoretical framework for multi-agent learning. The 
principles have been implemented (1) in a single system using an 
assembler-like  programming language to modify its own policy, 
and (2) a system consisting of multiple agents, where each agent 
is in fact just a connection in a fully recurrent reinforcement 
learning neural net. A by-product of this research is a general 
reinforcement learning algorithm for such nets. Preliminary 
experiments illustrate the theory.

Juergen Schmidhuber
IDSIA, Corso Elvezia 36
6900-Lugano, Switzerland
juergen at idsia.ch
http://www.idsia.ch