Summary and tech report and thesis availability (long)

Thu Mar 15 14:41:28 EST 1990

There are three topics in this (long) posting:

  Summary of replies to my message "problems with large training sets".

  Tech report availability announcement "Phoneme Recognition from the TIMIT
    database using Recurrent Error Propagation Networks"

  Thesis availability announcement "Dynamic Error Propagation Networks"

Mail me (ajr at eng.cam.ac.uk) if you would like a copy of the tech report and
thesis (I will be at ICASSP if anyone there would like to discuss (or save me
some postage)).

Tony Robinson

/*****************************************************************************/

Subject: Summary of replies to my message "problems with large training sets"

Thanks to: Ron Cole, Geoff Hinton, Yann Le Cun, Alexander Singer, Fu-Sheng
Tsung, Guy Smith and Rich Sutton for their replies, here is a brief summary:

Adaptive learning rates: The paper that was most recommended was:

  Jacobs, R. A. (1988)
  Increased rates of convergence through learning rate adaptation.
  {Neural Networks}, {\em 1} pp 295-307.

  The scheme described in this paper is nice in that it allows the step
  size scaling factor (\eta) for each weight to vary independently and
  variations of two orders of magnitude have been observed.

Use a faster machine: Something like a 2.7 GFlop Connection Machine could 
  shake some of these problems away!  There are two issues here, one is
  understanding the problem from which more efficient algorithms naturally
  develop, the other is the need to get results.  I don't know how the two
  will balance in future, but my guess is that we will need more compute.

Combined subset training: Several people have used small subsets for initial
  training, with later training combining these subsets.  The reference I was
  sent was:

  Fu-Sheng Tsung and Garrison Cottrell (1989)
  A Sequential Adder with Recurrent Networks
  IJCNN 89, June, Washington D.C 

  For reasons of software homogeneity, I prefer to use an increasing momentum
  term, initially it smooths over one "subset" but this increases until the 
  smoothing is over the whole training set.  I've never done a comparison of
  these techniques.

Use of higher order derivatives: A good step size can be estimated from the
  second order derivatives.  To me this looks very promising, but I haven't
  had time to play with it yet.  The reference is:

  Le Cun, Y.: "Generalization and Network Design Strategies",
  Tech Report CRG-TR-89-4, Dept. of computer science,
  University of Toronto, 1989.

/*****************************************************************************/

Subject: Tech report availability announcement:

	      Phoneme Recognition from the TIMIT database using
		    Recurrent Error Propagation Networks

			     CUED/F-INFENG/TR.42
		      Tony Robinson and Frank Fallside
		Cambridge University Engineering Department,
		   Trumpington Street, Cambridge, England.
		       Enquiries to: ajr at eng.cam.ac.uk

This report describes a speaker independent phoneme recognition system based
on the recurrent error propagation network recogniser described in
(RobinsonFallside89, FallsideLuckeMarslandOSheaOwenPragerRobinsonRussell90).

This recogniser employs a preprocessor which generates a range of types of
output including bark scaled spectrum, energy and estimates of formant
positions.  The preprocessor feeds a fully recurrent error propagation
network whose outputs are estimates of the probability that the given frame
is part of a particular phonetic segment.  The network is trained with a new
variation on the stochastic gradient descent procedure which updates the
weights by an adaptive step size in the direction given by the sign of the
gradient.  Once trained, a dynamic programming match is made to find the most
probable symbol string of phonetic segments.  The recognition rate is
improved considerably when duration and bigram probabilities are used to
constrain the symbol string.

A set of recognition results is presented for the trade off between insertion
and deletion errors.  When these two errors balance the recognition rate for
all 61 TIMIT symbols is 68.6% correct (62.5% including insertion errors) and
on a reduced 39 symbol set the recognition rate is 75.1% correct (68.9%).
This compares favourably with the results of other methods on the same
database (ZueGlassPhillipsSeneff89, DigalakisOstendorfRohlicek89,
HataokaWaibel89, LeeHon89, LevinsonLibermanLjoljeMiller89).

/*****************************************************************************/

Subject: Thesis availability announcement "Dynamic Error Propagation Networks"

Please forgive me for the title, a better one would have been "Recurrent
Error Propagation Networks".  This is my PhD thesis, submitted in Feb 1989
and is a concatenation of the work I had done to that date.

Summary:

This thesis extends the error propagation network to deal with time varying
or dynamic patterns.  Examples are given of supervised, reinforcement driven
and unsupervised learning.

Chapter 1 presents an overview of connectionist models.

Chapter 2 introduces the error propagation algorithm for general node types.

Chapter 3 discusses the issue of data representation in connectionist models.

Chapter 4 describes the use of several types of networks applied to the
problem of the recognition of steady state vowels from multiple speakers.

Chapter 5 extends the error propagation algorithm to deal with time varying
input.  Three possible architectures are explored which deal with learning
sequences of known length and sequences of unknown and possibly indefinite
length.  Several simple examples are given.

Chapter 6 describes the use of two dynamic nets to form a speech coder.  The
popular method of Differential Pulse Code Modulation for speech coding
employs two linear filters to encoded and decode speech.  By generalising
these to non-linear filters, implemented as dynamic nets, a reduction in the
noise imposed by a limited bandwidth channel is achieved.

Chapter 7 describes the application of a dynamic net to the recognition of a
large subset of the phonemes of English from continuous speech.  The dynamic
net is found to give a higher recognition rate both in comparison with a
fixed window net and with the established k nearest neighbour technique.

Chapter 8 describes a further development of dynamic nets which allows them
to be trained by a reinforcement signal which expresses the correctness of
the output of the net.  Two possible architectures are given and an example
of learning to play the game of noughts and crosses is presented.