TR announcement

Tue Oct 3 10:08:17 EDT 1989

****************  PLEASE DO NOT FORWARD TO OTHER MAILING LISTS  ****************

			Technical Report:  CMU-CS-89-166

              THE META-PI NETWORK:  BUILDING DISTRIBUTED KNOWLEDGE
                 REPRESENTATIONS FOR ROBUST PATTERN RECOGNITION

		J. B. Hampshire II   and    A. H. Waibel

	       QUICK ABSTRACT (30 seconds, plain English, no frills)

The "Meta-Pi" architecture is a multi-network connectionist backprop structure.
It learns to focus attention on the output of a particular sub-network
or group of sub-networks via multiplicative connections.
When used to perform multi-speaker speech recognition this network
yields recognition error rates as low as those for speaker DEpendent
tasks (~98.5%), and about one third the rate of more traditional
networks trained on the same multi-speaker task.  Meta-Pi
networks are trained for best output performance and
*automatically* learn the best mix or selection of neural subcomponents.
Here, for example, they learned about relevant speaker differences
(and similarities) without being told to actually recognize the different
speakers.  If this sounds interesting, please read on.

			      SUMMARY

We present a multi-network connectionist architecture that forms distributed
low-level knowledge representations critical to robust pattern recognition
in non-stationary stochastic processes.  This new network comprises a number
of stimulus-specific sub-networks (i.e., networks trained to classify
a particular type of stimulus) that are linked by a combinational superstructure.
Our application employs Time-Delay Neural Network (TDNN) architectures
for the sub-networks and the combinational superstructure of the Meta-Pi
network, although one can use any form of backpropagation network as the basis
for a Meta-Pi architecture.  The combinational superstructure of the Meta-Pi
network adapts to the stimulus being processed, optimally integrating
stimulus-specific classifications based on its internally-developed model of
the stimulus (or combination of stimuli) most likely to have produced the
input signal.  To train this combinational network we have developed a new
form of multiplicative connection that we call the ``Meta-Pi'' connection.
We illustrate how the Meta-Pi paradigm implements a dynamically adaptive
Bayesian connectionist classifier.

We demonstrate the Meta-Pi architecture's performance in the context of
multi-speaker phoneme recognition.  In this task the Meta-Pi superstructure
integrates TDNN sub-networks to perform multi-speaker phoneme recognition
at speaker-DEpendent rates.  It achieves a 6-speaker (4 males, 2 females)
recognition rate of 98.4% on a database of voiced-stops (/b,d,g/).  This
recognition performance represents a significant improvement over the 95.9%
multi-speaker recognition rate obtained by a single TDNN trained in
multi-speaker fashion.  It also approaches the 98.7% average of the
speaker-DEpendent recognition rates for the six speakers processed.  We show
that the Meta-Pi network can learn --- without direct supervision --- to
recognize the speech of one particular speaker using a dynamic combination
of internal models of *other* speakers exclusively (99.8% correct).

The Meta-Pi model constitutes a viable basis for connectionist pattern
recognition systems that can rapidly adapt to new stimuli by using dynamic,
conditional combinations of existing stimulus-specific models.  Additionally,
it demonstrates a number of performance characteristics that would be
desirable in autonomous connectionist pattern recognition systems that could
develop and maintain their own database of stimuli models, adapting to new
stimuli when possible, spawning new stimulus-specific learning processes
when necessary, and eliminating redundant or obsolete stimulus-specific
models when appropriate.

   This research has been funded by Bell Communications Research, 
     ATR Interpreting Telephony Research Laboratories, and the
       National Science Foundation (NSF grant EET-8716324).

REQUESTS:  Please send requests for tech. report CMU-CS-89-166 to
hamps at speech2.cs.cmu.edu  (ARPAnet)
"Allow 4 weeks for delivery..."

****************  PLEASE DO NOT FORWARD TO OTHER MAILING LISTS  ****************