Report available

Fri Jun 24 13:17:18 EDT 1994

FTP-host: ftp-dsi.ing.unifi.it
FTP-file: pub/tech-reports/em.tr-11-94.ps.Z

ULR: ftp://ftp-dsi.ing.unifi.it/pub/tech-reports/em.tr-11-94.ps.Z

  The following technical report is available by anonymous ftp. Length
is 41 pages.

------------------------------------------------------------------

	    An EM Approach to Learning Sequential Behavior

			    Yoshua Bengio
	    Dept. Informatique et Recherche Operationnelle
	     Universite de Montreal, Montreal, Qc H3C-3J7
		       bengioy at iro.umontreal.ca

			    Paolo Frasconi
		Dipartimento di Sistemi e Informatica
		    Universita di Firenze (Italy)
		     paolo at mcculloch.ing.unifi.it

	    Tech. Report. DSI 11/94 Universita di Firenze

			       Abstract

   We  consider  problems of  sequence processing   and  we propose  a
solution based on  a  discrete state  model. We introduce  a recurrent
architecture having a modular  structure that allocates subnetworks to
discrete states.  Different subnetworks  are model the dynamics (state
transition) and the  output of the  model, conditional on the previous
state  and   an  external  input.     The model  has    a  statistical
interpretation  and can   be trained  by the  EM  or  GEM  algorithms,
considering  state  trajectories  as   missing data. This   allows  to
decouple temporal credit assignment and actual parameters estimation.

   The model presents similarities to hidden Markov models, but allows
to map input sequences to output sequences,  using the same processing
style of recurrent  networks. For this reason  we call it Input/Output
HMM (IOHMM). Another remarkable difference  is that IOHMMs are trained
using  a  supervised    learning paradigm  (while   potentially taking
advantage of the  EM algorithm), whereas standard  HMMs are trained by
an unsupervised EM algorithm (or  a supervised criterion with gradient
ascent).

   We also study the problem  of learning long-term dependencies  with
Markovian systems, making comparisons to recurrent networks trained by
gradient   descent. The analysis   reported  in this  paper shows that
Markovian   models generally suffer   from a  problem of diffusion  of
temporal   credit   for long-term  dependencies  and  fully  connected
transition graphs.  However,   while   recurrent networks  exhibit   a
conflict between long-term information storing and trainability, these
two requirements  are either both satisfied or  both not  satisfied in
Markovian models.  Finally, we demonstrate that EM supervised learning
is     well     suited   for       solving      grammatical  inference
problems. Experimental results   are presented  for  the seven  Tomita
grammars,  showing that these  adaptive   models can attain  excellent
generalization.

      ---------------------------------------------------------
	    Paolo Frasconi <paolo at mcculloch.ing.unifi.it>
		Dipartimento di Sistemi e Informatica.
	      Via di Santa Marta 3 50139 Firenze (Italy)
	      +39 (55) 479-6361 / fax +39 (55) 479-6363
      ---------------------------------------------------------