Announcement of Technical Report availability.
K. Reinhard
kr10000 at eng.cam.ac.uk
Mon Feb 16 05:46:34 EST 1998
The following technical report is available by anonymous ftp from the
archive of the Speech, Vision and Robotics Group at the Cambridge
University Engineering Department (http://svr-www.eng.cam.ac.uk/reports/index-full.html).
PARAMETRIC SUBSPACE MODELING OF SPEECH TRANSITIONS
K. Reinhard and M. Niranjan
Technical Report CUED/F-INFENG/TR.308
Cambridge University Engineering Department
Trumpington Street, Cambridge CB2 1PZ
U.K., England
Abstract
This report describes an attempt at capturing segmental transition
information for speech recognition tasks. The slowly varying dynamics
of spectral trajectories carries much discriminant information
that is very crudely modelled by traditional approaches such as
HMMs. In approaches such as recurrent neural networks there is the hope,
but not the convincing demonstration, that such transitional information
could be captured. The method presented here starts from the very
different position of explicitly capturing the trajectory of short time
spectral parameter vectors on a subspace in which the temporal sequence
information is preserved. We approach this by introducing a temporal
constraint into the well known technique of Principal Component Analysis.
On this subspace, we attempt a parametric modelling of the trajectory,
and compute a distance metric to perform classification of diphones.
We use the principal curves method of Hastie and Stuetzle and the
Generative Topographic map (GTM) technique of Bishop, Svenson and
Williams to describe the temporal evolution in terms of latent
variables. On the difficult problem of /bee/, /dee/, /gee/ we are able
to retain discriminatory information with a small number of
parameters. Experimental illustrations present results on ISOLET and
TIMIT database.
More information about the Connectionists
mailing list