Speech Recognition & NNs preprints/reprints available
Vince Weatherill
vincew at cse.ogi.edu
Wed Feb 13 18:30:36 EST 1991
Reprints and preprints are now available for the following publications
of the OGI Speech Group. Please respond directly to me by
e-mail or surface mail. Don't forget to include your address
with your request. Unless you indicate otherwise, I will
send all 6 reports.
Vince Weatherill
Dept. of Computer Science and Engineering
Oregon Graduate Institute
19600 NW von Neumann Drive
Beaverton, OR 97006-1999
Barnard, E., Cole, R.A., Vea, M.P., and Alleva, F. "Pitch
detection with a neural-net classifier," IEEE Transactions
on Acoustics, Speech & Signal Processing, (February, 1991).
Cole, R.A., M. Fanty, M. Gopalakrishnan, and R.D.T. Janssen,
"Speaker-independent name retrieval from spellings using a
database of 50,000 names," Proceedings of the IEEE Interna-
tional Conference on Acoustics, Speech and Signal Process-
ing, Toronto, Canada, May 14-17, (1991).
Muthusamy, Y. K., R.A. Cole, and M. Gopalakrishnan, "A segment-
based approach to automatic language identification,"
Proceedings of the 1991 IEEE International Conference on
Acoustics, Speech and Signal Processing, Toronto, Canada,
May 14-17, (1991).
Fanty, M., R. A. Cole, and , "Spoken Letter Recognition,"
Proceedings of the Neural Information Processing Systems
Conference, Denver, CO, (Nov. 1990).
Janssen, R.D.T, M. Fanty, and R.A. Cole, "Speaker-independent
phonetic classification in continuous English letters,"
Proceedings of the International Joint Conference on Neural
Networks, Seattle, WA, Jul 8-12, (1991), submitted
for publication.
Fanty, M., R. A. Cole, and , "Speaker-independent English alpha-
bet recognition: Experiments with the E-Set," Proceedings of
the 1990 International Conference on Spoken Language Pro-
cessing, Kobe, Japan, (Nov. 1990).
****************************************************************
PITCH DETECTION WITH A NEURAL-NET CLASSIFIER
Etienne Barnard, Ronald Cole, M. P. Vea and Fil Alleva
ABSTRACT
Pitch detection based on neural-net classifiers is investi-
gated. To this end, the extent of generalization attainable
with neural nets is first examined, and it is shown that a
suitable choice of features is required to utilize this pro-
perty. Specifically, invariant features should be used
whenever possible. For pitch detection, two feature sets,
one based on waveform samples and the other based on proper-
ties of waveform peaks, are introduced. Experiments with
neural classifiers demonstrate that the latter feature set
--which has better invariance properties--performs more suc-
cessfully. It is found that the best neural-net pitch tracker
approaches the level of agreement of human labelers on the
same data set, and performs competitively in comparison to a
sophisticated feature-based tracker.An analysis of the errors committed by the neural net
(relative to the hand labels used for training) reveals that
they are mostly due to inconsistent hand labeling of ambigu-
ous waveform peaks.
*************************************************************
SPEAKER-INDEPENDENT NAME RETRIEVAL FROM SPELLINGS USING A
DATABASE OF 50,000 NAMES
Ronald Cole, Mark Fanty, Murali Gopalakrishnan, Rik Janssen
ABSTRACT
We describe a system that recognizes names spelled with
pauses between letters using high quality speech. The sys-
tem uses neural network classifiers to locate and classify
letters, then searches a database of names to find the best
match to the letter scores. The directory name retrieval
system was evaluated on 1020 names provided by 34 speakers
who were not used to train the system. Using a database of
50,000 names, 972, or 95.3%, were correctly identified as
the first choice. Of the remaining 48 names, all but 10
were in the top 3 choices. Ninty nine percent of letters
were correctly located, although speakers failed to pause
completely about 10% of the time. Classification of indivi-
dual spoken letters that were correctly located was 93%.
*************************************************************
A SEGMENT-BASED APPROACH TO
AUTOMATIC LANGUAGE IDENTIFICATION
Yeshwant K. Muthusamy, Ronald A. Cole and Murali Gopalakrishnan
ABSTRACT
A segment-based approach to automatic language identifica-
tion is based on the idea that the acoustic structure of
languages can be estimated by segmenting speech into broad
phonetic categories. Automatic language identification can
then be achieved by computing features that describe the
phonetic and prosodic characteristics of the language, and
using these feature measurements to train a classifier to
distinguish between languages. As a first step in this
approach, we have built a multi-language, neural network-
based segmentation and broad classification algorithm using
seven broad phonetic categories. The algorithm was trained
and tested on separate sets of speakers of American English,
Japanese, Mandarin Chinese and Tamil. It currently performs
with an accuracy of 82.3% on the utterances of the test set.
*************************************************************
SPOKEN LETTER RECOGNITION
Mark Fanty and Ron Cole
ABSTRACT
Through the use of neural network classifiers and careful
feature selection, we have achieved high-accuracy speaker-
independent spoken letter recognition. For isolated
letters, a broad-category segmentation is performed Location
of segment boundaries allows us to measure features at
More information about the Connectionists
mailing list