reinforcement learning economy

Juergen Schmidhuber juergen at idsia.ch
Fri Jul 17 09:49:58 EDT 1998


This message is triggered by Eric Baum's recent announcement of his
interesting papers on evolutionary economies for reinforcement learning,
"Hayek machine", and metalearning.  I would like to mention that several
related ideas are expressed in an old paper from 1987 [1].

Pages 23-51 of [1] are devoted to "Prototypical Self-referential
Associating Learning Mechanisms (PSALM1 - PSALM3).  Hayek2 (the
most recent Hayek variant) is somewhat reminiscent of PSALM3,
where competing/cooperating reinforcement learning agents bid for
executing actions. Winners may receive external reward for achieving
goals. Agents are supposed to learn the credit assignment process
itself (metalearning). For this purpose they can execute actions for
collectively constructing and connecting and modifying agents and for
transferring credit (reward) to agents.  A crucial difference between
PSALM3 and Hayek2 may be that PSALM3 does not strictly enforce individual
property rights.  For instance, agents may steal money from other agents
and temporally use it in a way that does not contribute to the system's
overall progress. On the other hand, to the best of my knowledge, PSALMs
are the first machine learning systems that enforce the important
constraint of total credit conservation (except for consumption and
external reward) - this constraint is not enforced in Holland's landmark
bucket brigade classifier economy (1985), which may cause inflation
and other problems. Reference [1] also inspired a slightly more recent
but less general approach enforcing money conservation, where money is
"weight substance" of a reinforcement learning neural net [2].

Pages 7-13 of [1] are devoted to an alternative "Genetic Programming"
(GP) approach that recursively applies metalevel GP to the task of finding
better program-modifying programs on lower levels - the goal is to use
GP for improving GP. It may be worth mentioning that this was suggested
long before GP itself (invented by Cramer in 1985) was popularized in
the 1990s.

It should be stated that reference [1] does not meet the scientific
standards of a journal publication - it is the first paper I ever
wrote in a foreign language (as an undergraduate). But despite its age
(it was first distributed more than a decade ago) it may still be of
at least historic interest due to renewed attention to market models
and metalearning (and also GP). Unfortunately there is no digital
version, but if you are interested I will send you a hardcopy (this 
may take some time depending on demand).

[1] J. Schmidhuber. Evolutionary Principles in Self-Referential 
Learning. On Learning how to Learn: The Meta-Meta-Meta...-Hook. 
Diploma thesis, Tech. Univ. Munich, 1987
[2] J. Schmidhuber. The Neural Bucket Brigade: A local learning 
algorithm for dynamic feedforward and recurrent networks. Connection 
Science, 1(4):403-412, 1989, http://www.idsia.ch/~juergen/onlinepub.html

_________________________________________________
Juergen Schmidhuber             research director
IDSIA, Corso Elvezia 36, 6900-Lugano, Switzerland
juergen at idsia.ch            www.idsia.ch/~juergen



More information about the Connectionists mailing list