3 IDSIA papers
Juergen Schmidhuber
juergen at idsia.ch
Thu Jun 27 14:55:29 EDT 1996
3 related papers available, all based on a recent, novel, general
reinforcement learning paradigm that allows for metalearning and
incremental self-improvement (IS).
____________________________________________________________________
SIMPLE PRINCIPLES OF METALEARNING
Juergen Schmidhuber & Jieyu Zhao & Marco Wiering
Technical Report IDSIA-69-96, June 27, 1996
23 pages, 195 K compressed, 662 K uncompressed
The goal of metalearning is to generate useful shifts of inductive
bias by adapting the current learning strategy in a "useful" way.
Our learner leads a single life during which actions are continually
executed according to the system's internal state and current policy
(a modifiable, probabilistic algorithm mapping environmental inputs
and internal states to outputs and new internal states). An action
is considered a learning algorithm if it can modify the policy.
Effects of learning processes on later learning processes are
measured using reward/time ratios. Occasional backtracking enforces
success histories of still valid policy modifications corresponding
to histories of lifelong reward accelerations. The principle allows
for plugging in a wide variety of learning algorithms. In particular,
it allows for embedding the learner's policy modification strategy
within the policy itself (self-reference). To demonstrate the
principle's feasibility in cases where traditional reinforcement
learning fails, we test it in complex, non-Markovian, changing
environments ("POMDPs"). One of the tasks involves more than 10^13
states, two learners that both cooperate and compete, and strongly
delayed reinforcement signals (initially separated by more than
300,000 time steps).
____________________________________________________________________
A GENERAL METHOD FOR INCREMENTAL SELF-IMPROVEMENT
AND MULTI-AGENT LEARNING IN UNRESTRICTED ENVIRONMENTS
Juergen Schmidhuber
To appear in X. Yao, editor, Evolutionary Computation: Theory and
Applications. Scientific Publ. Co., Singapore, 1996 (based on "On
learning how to learn learning strategies", TR FKI-198-94, TUM 1994).
30 pages, 146 K compressed, 386 K uncompressed.
____________________________________________________________________
INCREMENTAL SELF-IMPROVEMENT FOR LIFE-
TIME MULTI-AGENT REINFORCEMENT LEARNING
Jieyu Zhao Juergen Schmidhuber
To appear in Proc. SAB'96, MIT Press, Cambridge MA, 1996. 10 pages,
107 K compressed, 429 K uncompressed. A spin-off paper of the TR
above. It includes another experiment: a multi-agent system consis-
ting of 3 co-evolving, IS-based animats chasing each other learns
interesting, stochastic predator and prey strategies.
(Another spin-off paper is: M. Wiering and J. Schmidhuber. Solving
POMDPs using Levin search and EIRA. To be presented by MW at ML'96.)
____________________________________________________________________
To obtain copies, use ftp, or try the web:
http://www.idsia.ch/~juergen/onlinepub.html
FTP-host: ftp.idsia.ch
FTP-filenames: /pub/juergen/meta.ps.gz
/pub/juergen/ec96.ps.gz
/pub/jieyu/sab96.ps.gz
____________________________________________________________________
Juergen Schmidhuber & Jieyu Zhao & Marco Wiering
http://www.idsia.ch IDSIA
More information about the Connectionists
mailing list