new papers
Marco Wiering
marco at idsia.ch
Wed Oct 9 11:26:48 EDT 1996
HQ-LEARNING: DISCOVERING MARKOVIAN SUBGOALS
FOR NON-MARKOVIAN REINFORCEMENT LEARNING
Marco Wiering Juergen Schmidhuber
Technical Report IDSIA-95-96, 13 pages 108K
To solve partially observable Markov decision problems, we introduce
HQ-learning, a hierarchical extension of Q-learning. HQ-learning is
based on an ordered sequence of subagents, each learning to identify
and solve a Markovian subtask of the total task. Each agent learns
(1)an appropriate subgoal (though there is no intermediate, external
reinforcement for good subgoals), and (2) a Markovian policy, given
a particular subgoal. Our experiments demonstrate: (a) The system
can easily solve tasks standard Q-learning cannot solve at all. (b)
It can solve partially observable mazes with more states than those
used in most previous POMDP work. (c) It can quickly solve complex
tasks that require manipulation of the environment to free a blocked
path to the goal.
-------------------------------------------
Also available: THE NEURAL HEAT EXCHANGER ("invited talk" ICONIP'96)
An alternative learning method for multi-layer neural nets inspired
by the physical heat exchanger. Unlike backprop, it is truly local.
It was first presented during occasional talks since 1990, and is
closely related to Hinton et. al.'s recent Helmholtz Machine (1995).
FTP-host: ftp.idsia.ch
FTP-files: /pub/marco/hq96.ps.gz
/pub/juergen/hq96.ps.gz
/pub/juergen/heat.ps.gz
WWW: http://www.idsia.ch/~marco/publications.html
http://www.idsia.ch/~juergen/onlinepub.html
Comments welcome!
Marco Wiering & Juergen Schmidhuber IDSIA
More information about the Connectionists
mailing list