training time in HMM and CN
Steven J. Nowlan
nowlan at ai.toronto.edu
Tue Mar 28 09:41:36 EST 1989
Two comments on Thansis' post on the relative training speed of HMM vs CN
for sequential problems such as speech recognition:
1. The BF algorithm is quite highly optimized, while vanilla BP doesn't
implement anything that a numerical analyst would consider a real
descent procedure (not even steepest descent). If you were to use a
reasonably powerful numerical optimization technique, such as one of
the Broyden methods you may find CN convergence extremely fast. Ray
Watrous has in fact shown this sort of speedup for speech problems [1].
2. A more subtle, but probably more important difference, is the issue of
how targets are specified over an input sequence. The BF algorithm
specifies targets for intermediate steps in an input sequence based on
expectations of final outcome of that sequence collected from many
similar sequences. It is not clear how to specify output targets for
intermediate points of an input sequence in a CN, although Watrous
has shown that intelligent choice of such targets can markedly improve
CN convergence and performance. Of interest in this regard is the work
by Sutton on Temporal Difference methods [2]. One can view this work as
specifying a target function over a sequence in a dynamical way, so that
the target function reflects the experience of the system to date in a
clever way. Sutton [2] has shown an equivalence between one form of linear
TD method and the maximum likelihood estimates of the parameters for an
absorbing Markov chain model of the same process. This seems much closer
in flavour to what the BF algorithm is doing, and when applied to a
non-linear system may in fact be an interesting generalization of BF.
Comments and requests for clarifications should be directed to me, not to
Connectionists please.
- Steve Nowlan
nowlan at ai.toronto.edu
References:
[1] Watrous, Raymond L. "Speech Recognition Using Connectionist Networks",
TR MS-CIS-88-96, Department of Computer and Information Science,
University of Pennsylvania, Philadelphia, 1988.
[2] Sutton, Richard S. "Learning to Predict by the Methods of Temporal
Difference", GTE Technical Report TR87-509.1, GTE Laboratories Inc.
Waltham, Mass. 1987.
More information about the Connectionists
mailing list