new papers

Wed Oct 9 11:26:48 EDT 1996

            HQ-LEARNING: DISCOVERING MARKOVIAN SUBGOALS
            FOR  NON-MARKOVIAN  REINFORCEMENT  LEARNING

            Marco Wiering           Juergen Schmidhuber

            Technical Report IDSIA-95-96, 13 pages 108K

To solve partially observable Markov decision problems, we introduce
HQ-learning, a hierarchical extension of Q-learning.  HQ-learning is
based on an ordered sequence of subagents, each learning to identify
and solve a  Markovian subtask of the total task.  Each agent learns
(1)an appropriate subgoal (though there is no intermediate, external
reinforcement for good subgoals), and (2) a Markovian policy,  given
a  particular subgoal.  Our experiments demonstrate:  (a) The system
can easily solve tasks  standard Q-learning cannot solve at all. (b)
It can solve partially observable  mazes with more states than those
used in most previous  POMDP work. (c) It can quickly solve  complex
tasks that require manipulation of the environment to free a blocked
path to the goal.

            ------------------------------------------- 

Also available: THE NEURAL HEAT EXCHANGER ("invited talk" ICONIP'96)
An alternative learning method for multi-layer neural nets  inspired
by the physical heat exchanger.  Unlike backprop, it is truly local.
It was first presented  during occasional talks  since 1990,  and is
closely related to Hinton et. al.'s recent Helmholtz Machine (1995).

FTP-host:                                               ftp.idsia.ch
FTP-files:                                     /pub/marco/hq96.ps.gz
	                                     /pub/juergen/hq96.ps.gz
                                             /pub/juergen/heat.ps.gz
WWW:                    http://www.idsia.ch/~marco/publications.html
                         http://www.idsia.ch/~juergen/onlinepub.html 

Comments welcome!
Marco Wiering  &  Juergen Schmidhuber                          IDSIA