Levin Search and EIRA
Marco Wiering
marco at idsia.ch
Mon Apr 15 03:39:15 EDT 1996
FTP-host: ftp.idsia.ch
FTP-file: /pub/marco/ml_levin_eira.ps.gz
or /pub/juergen/ml_levin_eira.ps.gz
Solving POMDPs with Levin Search and EIRA
Marco Wiering Juergen Schmidhuber
Machine Learning: 13th Intern. Conf., 1996
9 pages, 86K compressed, 252K uncompressed
Partially observable Markov decision problems (POMDPs) recently received
a lot of attention in the reinforcement learning community. No attention,
however, has been paid to Levin's universal search through program space
(LS), which is theoretically optimal for a wide variety of search prob-
lems including many POMDPs. Experiments in this paper show that LS can
solve partially observable mazes (`POMs') involving many more states and
obstacles than those solved by various previous authors. We then note,
however, that LS is not necessarily optimal for learning problems where
experience with previous problems can be used to speed up the search.
For this reason, we introduce an adaptive extension of LS (ALS) which
uses experience to increase probabilities of instructions occurring in
successful programs found by LS. To deal with cases where ALS does not
lead to long-term performance improvement, we use the recent technique
``environment-independent reinforcement acceleration'' (EIRA) as a safe-
ty belt (EIRA currently is the only known method that guarantees a life-
long history of reward accelerations). Additional experiments demon-
strate: (a) ALS can dramatically reduce search time consumed by calls of
LS. (b) Further significant speed-ups can be obtained by combining ALS
and EIRA.
To obtain a copy, do one of these:
netscape http://www.idsia.ch/~marco/publications.html
netscape http://www.idsia.ch/~juergen/onlinepub.html
Marco Wiering
Juergen Schmidhuber IDSIA
More information about the Connectionists
mailing list