3 IDSIA papers

Juergen Schmidhuber juergen at idsia.ch
Thu Jun 27 14:55:29 EDT 1996


3 related papers available, all based on a  recent,  novel,  general 
reinforcement learning paradigm  that  allows  for  metalearning and 
incremental self-improvement (IS).

____________________________________________________________________


                  SIMPLE PRINCIPLES OF METALEARNING

        Juergen Schmidhuber  &  Jieyu Zhao  &  Marco Wiering        

        Technical Report IDSIA-69-96,          June 27, 1996 
        23 pages,   195 K compressed,     662 K uncompressed

The goal of metalearning  is to generate useful  shifts of inductive 
bias by  adapting the  current learning  strategy in  a "useful" way. 
Our learner leads a single life during which actions are continually 
executed according to the system's internal state and current policy 
(a modifiable, probabilistic algorithm  mapping environmental inputs 
and internal states  to outputs and new internal states).  An action 
is considered  a learning  algorithm  if it  can modify  the policy. 
Effects  of learning  processes  on  later  learning  processes  are 
measured using reward/time ratios.  Occasional backtracking enforces 
success histories of still valid policy  modifications corresponding 
to histories of lifelong reward accelerations.  The principle allows  
for plugging in a wide variety of learning algorithms. In particular,  
it allows  for embedding the learner's policy modification  strategy  
within  the  policy  itself  (self-reference).  To  demonstrate  the 
principle's  feasibility  in cases where  traditional  reinforcement 
learning  fails,  we test  it in  complex,  non-Markovian,  changing 
environments ("POMDPs"). One of the tasks  involves more than  10^13  
states, two learners that both cooperate  and compete,  and strongly 
delayed  reinforcement  signals  (initially  separated  by more than 
300,000 time steps).

____________________________________________________________________


        A  GENERAL  METHOD  FOR  INCREMENTAL SELF-IMPROVEMENT 
        AND MULTI-AGENT LEARNING IN UNRESTRICTED ENVIRONMENTS

                         Juergen Schmidhuber                  

To appear in X. Yao, editor,   Evolutionary Computation:  Theory and 
Applications. Scientific Publ. Co., Singapore, 1996  (based  on  "On 
learning how to learn learning strategies", TR  FKI-198-94, TUM 1994).  
30 pages, 146 K compressed, 386 K uncompressed.

____________________________________________________________________

              INCREMENTAL  SELF-IMPROVEMENT FOR LIFE-
              TIME MULTI-AGENT REINFORCEMENT LEARNING

              Jieyu Zhao          Juergen Schmidhuber

To appear in Proc. SAB'96, MIT Press, Cambridge MA, 1996.  10 pages, 
107 K compressed, 429 K uncompressed.  A  spin-off  paper  of the TR 
above.  It includes another experiment: a multi-agent system consis-
ting of 3 co-evolving,  IS-based animats  chasing each  other learns 
interesting, stochastic predator and prey strategies.

(Another spin-off paper is:   M. Wiering and J. Schmidhuber. Solving 
POMDPs using Levin search and EIRA. To be presented by MW at ML'96.)

____________________________________________________________________

           To obtain copies,  use ftp,  or try the web:        
           http://www.idsia.ch/~juergen/onlinepub.html
           FTP-host:                      ftp.idsia.ch
           FTP-filenames:      /pub/juergen/meta.ps.gz
                               /pub/juergen/ec96.ps.gz
                                /pub/jieyu/sab96.ps.gz
____________________________________________________________________


Juergen Schmidhuber  &  Jieyu Zhao  &  Marco Wiering          
http://www.idsia.ch                                            IDSIA



More information about the Connectionists mailing list