Thesis available on using SOM and LVQ for HMMs

Mon Nov 10 08:58:10 EST 1997

The following Dr.Tech. thesis is available at

http://www.cis.hut.fi/~mikkok/thesis/ 		(WWW home page)
http://www.cis.hut.fi/~mikkok/thesis/book/    	(output of latex2html)
http://www.cis.hut.fi/~mikkok/intro.ps.gz       (compressed postscript, 188K)
http://www.cis.hut.fi/~mikkok/intro.ps          (postscript, 57 pages, 712K)

The articles that belong to the thesis can be accessed through the page

http://www.cis.hut.fi/~mikkok/thesis/publications.html

---------------------------------------------------------------

Using Self-Organizing Maps 
and Learning Vector Quantization
for Mixture Density Hidden Markov Models

Mikko Kurimo

Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200, FIN-02015 HUT, Finland
Email: Mikko.Kurimo at hut.fi

Abstract
--------
This work presents experiments to recognize pattern sequences
using hidden Markov models (HMMs).
The pattern sequences in the experiments are computed from
speech signals and the recognition task is to decode the
corresponding phoneme sequences.
The training of the HMMs of the phonemes using the collected
speech samples is a difficult task because of the natural
variation in the speech.
Two neural computing paradigms, the Self-Organizing Map (SOM)
and the Learning Vector Quantization (LVQ) are used in the 
experiments to improve the recognition performance of the models.

A HMM consists of sequential states
which are trained to model the feature changes
in the signal produced during the modeled process.
The output densities applied in this work are mixtures 
of Gaussian density functions.
SOMs are applied to initialize and train the mixtures 
to give a smooth and faithful presentation of the feature 
vector space defined by the corresponding training samples.
The SOM maps similar feature vectors to nearby units,
which is here exploited in experiments to improve 
the recognition speed of the system. 

LVQ provides simple but efficient stochastic learning 
algorithms to improve the classification accuracy in 
pattern recognition problems.
Here, LVQ is applied to develop an iterative training 
method for mixture density HMMs, 
which increases both the modeling accuracy of the states
and the discrimination between the models of different phonemes.
Experiments are also made with LVQ based corrective tuning 
methods for the mixture density HMMs,
which aim at improving the models by learning from
the observed recognition errors in the training samples.

The suggested HMM training methods are tested using 
the Finnish speech database collected in the 
Neural Networks Research Centre 
at the Helsinki University of Technology.
Statistically significant improvements compared to the best 
conventional HMM training methods are obtained using 
the speaker dependent 
but vocabulary independent phoneme models.
The decrease in the average number of phoneme recognition 
errors for the tested speakers have been around 10 percent 
in the applied test material.

--
Email:  Mikko.Kurimo at hut.fi
Office: Helsinki University of Technology, 
	Neural Networks Research Centre
Mail:   P.O.Box 2200, FIN-02015 HUT, Finland