Speech Recognition & NNs preprints/reprints available

Wed Feb 13 18:30:36 EST 1991

Reprints and preprints are now available for the following publications
of the OGI Speech Group.  Please respond directly to me by
e-mail or surface mail.  Don't forget to include your address
with your request.  Unless you indicate otherwise, I will
send all 6 reports.

Vince Weatherill
Dept. of Computer Science and Engineering
Oregon Graduate Institute
19600 NW von Neumann Drive
Beaverton, OR  97006-1999

Barnard, E., Cole, R.A., Vea, M.P., and Alleva, F. "Pitch 
     detection with a neural-net classifier," IEEE Transactions
     on Acoustics, Speech & Signal Processing, (February, 1991).

Cole, R.A., M. Fanty, M. Gopalakrishnan, and R.D.T. Janssen,
     "Speaker-independent name retrieval from spellings using a
     database of 50,000 names," Proceedings of the IEEE Interna-
     tional Conference on Acoustics, Speech and Signal Process-
     ing, Toronto, Canada, May 14-17, (1991).

Muthusamy, Y. K., R.A. Cole, and M. Gopalakrishnan, "A segment-
     based approach to automatic language identification,"
     Proceedings of the 1991 IEEE International Conference on
     Acoustics, Speech and Signal Processing, Toronto, Canada,
     May 14-17, (1991).

Fanty, M., R. A. Cole, and , "Spoken Letter Recognition,"
     Proceedings of the Neural Information Processing Systems
     Conference, Denver, CO, (Nov. 1990).

Janssen, R.D.T, M. Fanty, and R.A. Cole, "Speaker-independent
     phonetic classification in continuous English letters,"
     Proceedings of the International Joint Conference on Neural
     Networks, Seattle, WA, Jul 8-12, (1991), submitted
     for publication.

Fanty, M., R. A. Cole, and , "Speaker-independent English alpha-
     bet recognition: Experiments with the E-Set," Proceedings of
     the 1990 International Conference on Spoken Language Pro-
     cessing, Kobe, Japan, (Nov. 1990).

****************************************************************

        PITCH DETECTION WITH A NEURAL-NET CLASSIFIER

   Etienne Barnard, Ronald Cole, M. P. Vea and Fil Alleva

                        ABSTRACT
Pitch detection based on neural-net classifiers is  investi-
gated.  To this end, the extent of generalization attainable
with neural nets is first examined, and it is shown  that  a
suitable choice of features is required to utilize this pro-
perty.  Specifically,  invariant  features  should  be  used
whenever  possible.   For pitch detection, two feature sets,
one based on waveform samples and the other based on proper-
ties  of  waveform  peaks, are introduced.  Experiments with
neural classifiers demonstrate that the latter  feature  set
--which has better invariance properties--performs more suc-
cessfully. It is found that the best neural-net pitch tracker
approaches  the  level of agreement of human labelers on the
same data set, and performs competitively in comparison to a
sophisticated  feature-based  tracker.An analysis of the errors committed by the neural net
(relative to the hand labels used for training) reveals that
they are mostly due to inconsistent hand labeling of ambigu-
ous waveform peaks.

*************************************************************

 SPEAKER-INDEPENDENT NAME RETRIEVAL FROM SPELLINGS USING A
                  DATABASE OF 50,000 NAMES

Ronald Cole, Mark Fanty, Murali Gopalakrishnan, Rik Janssen

			ABSTRACT
We describe a system  that  recognizes  names  spelled  with
pauses  between letters using high quality speech.  The sys-
tem uses neural network classifiers to locate  and  classify
letters,  then searches a database of names to find the best
match to the letter scores.  The  directory  name  retrieval
system  was  evaluated on 1020 names provided by 34 speakers
who were not used to train the system.  Using a database  of
50,000  names,  972,  or 95.3%, were correctly identified as
the first choice.  Of the remaining 48  names,  all  but  10
were  in  the  top 3 choices.  Ninty nine percent of letters
were correctly located, although speakers  failed  to  pause
completely about 10% of the time.  Classification of indivi-
dual spoken letters that were correctly located was 93%.

*************************************************************

                A SEGMENT-BASED APPROACH TO
             AUTOMATIC LANGUAGE IDENTIFICATION

Yeshwant K. Muthusamy, Ronald A. Cole and Murali Gopalakrishnan

                          ABSTRACT
A segment-based approach to automatic  language  identifica-
tion  is  based  on  the idea that the acoustic structure of
languages can be estimated by segmenting speech  into  broad
phonetic  categories.  Automatic language identification can
then be achieved by computing  features  that  describe  the
phonetic  and  prosodic characteristics of the language, and
using these feature measurements to train  a  classifier  to
distinguish  between  languages.   As  a  first step in this
approach, we have built a  multi-language,  neural  network-
based  segmentation and broad classification algorithm using
seven broad phonetic categories.  The algorithm was  trained
and tested on separate sets of speakers of American English,
Japanese, Mandarin Chinese and Tamil.  It currently performs
with an accuracy of 82.3% on the utterances of the test set.

*************************************************************

               SPOKEN LETTER RECOGNITION

                  Mark Fanty and Ron Cole

                        ABSTRACT
Through the use of neural network  classifiers  and  careful
feature  selection,  we have achieved high-accuracy speaker-
independent  spoken  letter   recognition.    For   isolated
letters, a broad-category segmentation is performed Location
of segment boundaries  allows  us  to  measure  features  at