PhD thesis available

Michael Schuster gustl at itl.atr.co.jp
Sun Mar 28 23:32:14 EST 1999


PhD Thesis available
====================

I sent this BCC mail to a number of people who asked me about my thesis, 
or who I thought might be interested in it. Because I sent it to mailing 
lists, too, it could happen that you get this message twice -- my apologies.

Mike Schuster

------------------------------------------------------------------------------ 

available from:
  http://isw3.aist-nara.ac.jp/IS/Shikano-lab/staff/1996/mike-s/mike-s.html
in the publication section  
  http://isw3.aist-nara.ac.jp/IS/Shikano-lab/staff/1996/mike-s/publication.html


ENGLISH TITLE:
  On supervised learning from sequential data with applications for speech 
  recognition

ENGLISH ABSTRACT:
  Many problems of engineering interest, for example speech recognition,
  can be formulated in an abstract sense as supervised learning from 
  sequential data, where an input sequence x_1^T = { x_1, x_2, x_3, ..., 
  x_{T-1}, x_T } has to be mapped to an output sequence y_1^T = { y_1, y_2, 
  y_3,  ..., y_{T-1}, y_T }. This thesis gives a unified view of the abstract 
  problem and presents some models and algorithms for improved sequence 
  recognition and modeling performance, measured on synthetic data and on 
  real speech data.

  A powerful neural network structure to deal with sequential data is the 
  recurrent neural network (RNN), which allows one to estimate P(y_t|x_1, x_2, 
  ..., x_t), the output at time t given all previous input. The first part 
  of this thesis presents various extensions to the basic RNN structure, which 
  are
    a) a bidirectional recurrent neural network (BRNN), which allows to 
       estimate expressions of the form P(y_t|x_1^T), the output at t given 
       all sequential input, for uni-modal regression and classification problems,
    b) an extended BRNN to directly estimate the posterior probability of a symbol 
       sequence, P(y_1^T|x_1^T), by modeling P(y_t|y_{t-1}, y_{t-2}, ..., y_1, x_1^T)
       without explicit assumptions about the shape of the distribution P(y_1^T|x_1^T),
    c) a BRNN to model multi-modal input data that can be described by Gaussian mixture 
       distributions conditioned on an output vector sequence, P(x_t|y_1^T), assuming 
       that neighboring x_t, x_{t+1} are conditionally independent, and 
    d) an extension to c) which removes the independence assumption by modeling 
       P(x_t|x_{t-1}, x_{t-2}, ..., x_1, y_1^T) to estimate the likelihood P(x_1^T|y_1^T) 
       of a given output sequence without any explicit approximations about the use 
       of context.

  The second part of this thesis describes the details of a fast and memory-efficient 
  one-pass stack decoder for speech recognition to perform the search for the most 
  probable word sequence. The use of this decoder, which can handle arbitrary order 
  N-gram language models and arbitrary order context-dependent acoustic models with 
  full cross-word expansion, led to the best reported recognition results on the 
  standard test set of a widely used Japanese newspaper dictation task.    

----------------------------------------------------------------------------

Table of Contents:

1  Introduction
  1.1  MOTIVATION AND BACKGROUND
    1.1.1  Learning from examples
    1.1.2  Does the order of the samples matter?  
    1.1.3  Example applications    
    1.1.4  Related scientific areas
  1.2  THESIS STRUCTURE
    
2  Supervised learning from sequential data
  2.1  DEFINITION OF THE PROBLEM
  2.2  DECOMPOSITION INTO A GENERATIVE AND A PRIOR MODEL PART
    2.2.1  Context-independent model
    2.2.2. Context-dependent model
  2.3  DIRECT DECOMPOSITION
  2.4  HIDDEN MARKOV MODELS
    2.4.1  Basic HMM formulation
    2.4.2  Calculation of state occupation probabilities
    2.4.3  Parameter estimation for output probability distributions
    2.4.4  Parameter estimation for transition probabilities
  2.5  SUMMARY

3  Neural networks for supervised learning from sequences
  3.1  BASICS OF NEURAL NETWORKS
    3.1.1  Parameter estimation by maximum likelihood
    3.1.2  Problem classification
    3.1.3  Neural network training
    3.1.4  Neural network architectures
  3.2  BIDIRECTIONAL RECURRENT NEURAL NETWORKS
    3.2.1  Prediction assuming independent outputs
    3.2.2  Experiments and results
    3.2.3  Prediction assuming dependent outputs
    3.2.4  Experiments and results
  3.3  MIXTURE DENSITY RECURRENT NEURAL NETWORKS
    3.3.1  Basics of mixture density networks
    3.3.2  Mixture density extensions for BRNNs
    3.3.3  Experiments and results
    3.3.4  Discussion
  3.4  SUMMARY

4  Memory-efficient LVCSR search using a one-pass stack decoder
  4.1  INTRODUCTION
    4.1.1  Organization of this chapter
    4.1.2  General
    4.1.3  Technical
    4.1.4  Decoder types
  4.2  A MEMORY-EFFICIENT ONE_PASS STACK DECODER
    4.2.1  Basic algorithm
    4.2.2  Pruning techniques
    4.2.3  Stack module
    4.2.4  Hypotheses module
    4.2.5  N-gram module
    4.2.6  LM lookahead
    4.2.7  Cross-word models
    4.2.8  Fast-match with delay
    4.2.9  Using word-graphs as language model constraints
    4.2.10 Lattice rescoring
    4.2.11 Generating phone/state alignments
  4.3  EXPERIMENTS
    4.3.1  Recognition of Japanese
    4.3.2  Recognition results for high accuracy
    4.3.3  Recognition results for high speed and low memory
    4.3.4  Time and memory requirements for modules
    4.3.5  Usage of cross-word models
    4.3.6  Usage of fast-match models
    4.3.7  Effect of on-demand N-gram smearing 
    4.3.8  Lattice/N-nest list generation and lattice rescoring
  4.4  CONCLUSIONS
  4.5  ACKNOWLEDGMENTS
  
5. Conclusions
  5.1  SUMMARY
  5.2  CONTRIBUTIONS FROM THIS THESIS
  5.3  SUGGESTIONS FOR FUTURE WORK
  
----------------------------------------------------------------------------
Mike Schuster, ATR Interpreting Telecommunications Research Laboratories,
2-2 Hikari-dai, Seika-cho, Soraku-gun, Kyoto 619-02, JAPAN, 
Tel. ++81-7749-5-1394, Fax. ++81-7749-5-1308, email: gustl at itl.atr.co.jp,
http://isw3.aist-nara.ac.jp/IS/Shikano-lab/staff/1996/mike-s/mike-s.html
----------------------------------------------------------------------------



More information about the Connectionists mailing list