Scaling in Neural Nets

Thu Sep 22 13:06:41 EDT 1988

Below the abstract to a paper describing our recent research
addressing the problem of scaling in neural networks for
speech recognition.  We show that by exploiting the hidden
structure (previously learned abstractions) of speech in a
modular way and applying "conectionist glue", larger more
complex networks can be constructed at only small additional
cost in learning time and complexity.  Resulting recognition
performance is as good or better than comparable monolithically
trained nets and as good as the smaller network modules.  This
work was performed at ATR Interpreting Telephony Research
Laboratories, in Japan.

I am now working at Carnegie Mellon University, so you may
request copies from me here or directly from Japan.
>From CMU:

Dr. Alex Waibel
Computer Science Department
Carnegie-Mellon University
Pittsburgh, PA 15213
phone: (412) 268-7676
email: ahw at speech2.cs.cmu.edu

>From Japan, please write for technical report TR-I-0034
(with CC to me), to:

Ms. Kazumi Kanazawa
ATR Interpreting Telephony Research Laboratories
Twin 21 MID Tower,
2-1-61 Shiromi, Higashi-ku,
Osaka, 540, Japan
email: kddlab!atr-la.atr.junet!kanazawa at uunet.UU.NET
Please CC to: ahw at speech2.cs.cmu.edu

-------------------------------------------------------------------------

	Modularity and Scaling in Large Phonemic Neural Networks
	    Alex Waibel, Hidefumi Sawai, Kiyohiro Shikano
	  ATR Interpreting Telephony Research Laboratories

			    ABSTRACT
Scaling connectionist models to larger connectionist systems is
difficult, because larger networks require increasing amounts of
training time and data and the complexity of the optimization
task quickly reaches computationally unmanageable proportions.
In this paper, we train several small Time-Delay Neural Networks
aimed at all phonemic subcategories (nasals, fricatives, etc.)
and report excellent fine phonemic discrimination performance for
all cases.  Exploiting the hidden structure of these smaller
phonemic subcategory networks, we then propose several techniques
that allow us to "grow" larger nets in an incremental and modular
fashion without loss in recognition performance and without the
need for excessive training time or additional data.  These
techniques include {\em class discriminatory learning, connectionist
glue, selective/partial learning and all-net fine tuning}.  A set
of experiments shows that stop consonant networks (BDGPTK) constructed
from subcomponent BDG- and PTK-nets achieved up to 98.6% correct
recognition compared to 98.3% and 98.7% correct for the component
BDG- and PTK-nets.  Similarly, an incrementally trained network
aimed at {\em all} consonants achieved recognition scores of
95.9% correct.  These result were found to be comparable to the
performance of the subcomponent networks and significantly better
than several alternative speech recognition strategies.