Announcement of Technical Report availability.

K. Reinhard kr10000 at eng.cam.ac.uk
Mon Feb 16 05:46:34 EST 1998


The following technical report is available by anonymous ftp from the
archive of the Speech, Vision and Robotics Group at the Cambridge
University Engineering Department (http://svr-www.eng.cam.ac.uk/reports/index-full.html).

    
    PARAMETRIC SUBSPACE MODELING OF SPEECH TRANSITIONS

            K. Reinhard and M. Niranjan

       Technical Report CUED/F-INFENG/TR.308

	    Cambridge University Engineering Department 
		   Trumpington Street, Cambridge CB2 1PZ 
			          U.K., England 


                             Abstract
This report describes an attempt at capturing segmental transition
information for speech recognition tasks. The slowly varying dynamics
of spectral trajectories carries much discriminant information
that is very crudely modelled by traditional approaches such as
HMMs. In approaches such as recurrent neural networks there is the hope,
but not the convincing demonstration, that such transitional information
could be captured. The method presented here starts from the very
different position of explicitly capturing the trajectory of short time 
spectral parameter vectors on a subspace in which the temporal sequence
information is preserved. We approach this by introducing a temporal 
constraint into the well known technique of  Principal Component Analysis.
On this subspace, we attempt a parametric modelling of the trajectory,
and compute a distance metric to perform classification of diphones.
We use the principal curves method of Hastie and Stuetzle and the
Generative Topographic map (GTM) technique of Bishop, Svenson and
Williams to describe the temporal evolution in terms of latent
variables. On the difficult problem of /bee/, /dee/, /gee/ we are able
to retain discriminatory information with a small number of
parameters. Experimental illustrations present results on ISOLET and
TIMIT database. 



More information about the Connectionists mailing list