PhD thesis available

Sun Mar 19 13:09:07 EST 2000

Dear Connectionists,

My PhD thesis on hierarchical connectionist acoustic
modeling for large vocabulary speech recognition is
now available on the WWW at

http://isl.ira.uka.de/~fritsch

For those interested, I have appended the abstract.

Best regards,
--Juergen Fritsch.

==========================================================
Juergen Fritsch                         Research Scientist
----------------------------------------------------------
               Interactive Systems Labs
  University of Karlsruhe & Carnegie Mellon University
phone:++49-721-6086285      http://isl.ira.uka.de/~fritsch
fax:++49-721-607721              email: fritsch at ira.uka.de
==========================================================

Abstract:

         Hierarchical Connectionist Acoustic Modeling for
       Domain-Adaptive Large Vocabulary Speech Recognition

                         Juergen Fritsch
                      PhD Thesis, 238 pages
                     Interactive Systems Labs
                   Faculty of Computer Science
                     University of Karlsruhe
                             Germany

                             ABSTRACT

This thesis presents a new, hierarchical framework for connectionist
acoustic modeling in large vocabulary statistical speech recognition
systems. Based on the divide and conquer paradigm, the task of estimating
HMM state posteriors is decomposed and distributed in the form of
a tree-structured architecture consisting of thousands of small neural
networks. In contrast to monolithic connectionist models, our approach
scales to arbitrarily large state spaces. Phonetic context is represented
simultaneously at multiple resolutions which allows for scalable acoustic
modeling. We demonstrate that the hierarchical structure allows for
(1) accelerated score computations through dynamic tree pruning,
(2) effective speaker adaptation with limited amounts of adaptation data
and (3) downsizing of the trained model for small memory footprints.

The viability of the proposed hierarchical model is demonstrated in
recognition experiments on the Switchboard large vocabulary
conversational telephone speech corpus, currently considered the most
difficult standardized speech recognition benchmark, where it achieves
state-of-the-art performance with less parameters and faster recognition
times compared to conventional mixture models.

The second contribution of this thesis is an algorithm that allows
for domain-adaptive speech recognition using the proposed hierarchical
acoustic model. In contrast to humans, automatic speech recognition
systems still suffer from a strong dependence on the application
domain they have been trained on. Typically, a speech recognition system
has to be tailored to a specific application domain to reduce semantic,
syntactic and acoustic variability and thus increase recognition
accuracy. Unfortunately, this approach results in a lack of portability
as performance typically deteriorates unacceptably when moving to a
new application domain.

We present Structural Domain Adaptation (SDA), an algorithm for
hierarchically organized acoustic models that exploits the scalable
specificity of phonetic context modeling by modifying the tree structure
for optimal performance on previously unseen application domains. We
demonstrate the effectiveness of the SDA approach by adapting a large
vocabulary conversational telephone speech recognition system to (1) a
telephone dictation task and (2) spontaneous scheduling of meetings. SDA
together with domain-specific dictionaries and language models allows to
match the performance of domain-specific models with only 45-60 minutes
of acoustic adaptation data.