Report available
Paolo Frasconi
paolo at mcculloch.ing.unifi.it
Fri Jun 24 13:17:18 EDT 1994
FTP-host: ftp-dsi.ing.unifi.it
FTP-file: pub/tech-reports/em.tr-11-94.ps.Z
ULR: ftp://ftp-dsi.ing.unifi.it/pub/tech-reports/em.tr-11-94.ps.Z
The following technical report is available by anonymous ftp. Length
is 41 pages.
------------------------------------------------------------------
An EM Approach to Learning Sequential Behavior
Yoshua Bengio
Dept. Informatique et Recherche Operationnelle
Universite de Montreal, Montreal, Qc H3C-3J7
bengioy at iro.umontreal.ca
Paolo Frasconi
Dipartimento di Sistemi e Informatica
Universita di Firenze (Italy)
paolo at mcculloch.ing.unifi.it
Tech. Report. DSI 11/94 Universita di Firenze
Abstract
We consider problems of sequence processing and we propose a
solution based on a discrete state model. We introduce a recurrent
architecture having a modular structure that allocates subnetworks to
discrete states. Different subnetworks are model the dynamics (state
transition) and the output of the model, conditional on the previous
state and an external input. The model has a statistical
interpretation and can be trained by the EM or GEM algorithms,
considering state trajectories as missing data. This allows to
decouple temporal credit assignment and actual parameters estimation.
The model presents similarities to hidden Markov models, but allows
to map input sequences to output sequences, using the same processing
style of recurrent networks. For this reason we call it Input/Output
HMM (IOHMM). Another remarkable difference is that IOHMMs are trained
using a supervised learning paradigm (while potentially taking
advantage of the EM algorithm), whereas standard HMMs are trained by
an unsupervised EM algorithm (or a supervised criterion with gradient
ascent).
We also study the problem of learning long-term dependencies with
Markovian systems, making comparisons to recurrent networks trained by
gradient descent. The analysis reported in this paper shows that
Markovian models generally suffer from a problem of diffusion of
temporal credit for long-term dependencies and fully connected
transition graphs. However, while recurrent networks exhibit a
conflict between long-term information storing and trainability, these
two requirements are either both satisfied or both not satisfied in
Markovian models. Finally, we demonstrate that EM supervised learning
is well suited for solving grammatical inference
problems. Experimental results are presented for the seven Tomita
grammars, showing that these adaptive models can attain excellent
generalization.
---------------------------------------------------------
Paolo Frasconi <paolo at mcculloch.ing.unifi.it>
Dipartimento di Sistemi e Informatica.
Via di Santa Marta 3 50139 Firenze (Italy)
+39 (55) 479-6361 / fax +39 (55) 479-6363
---------------------------------------------------------
More information about the Connectionists
mailing list