[Olympus developers 237]: Re: Problems with VAD
José David Lopes
zedavid at l2f.inesc-id.pt
Fri Jun 11 06:26:58 EDT 2010
Hi Matt,
Thanks for your answer.
However, my misunderstanding was where to modify the configuration file
in order to change the end-pointing mode.
Best,
Jose
Em 09-06-2010 22:46, Matthew Marge escreveu:
> Hi Jose,
>
> This provides a good definition of endpointing as it relates to spoken
> dialogue systems:
>
> Endpointing is the process of determining the beginning and the
> end of speech within an incoming sample stream. This takes into
> account whether or not energy is detected at speech frequencies,
> the duration of the detected sound and pitch extraction. The pitch
> is tracked to help recognise vowels as indications of speech and
> to filter out background noise events, such as a door closing. The
> endpointer can be tuned to avoid false barge-in, or
> false-triggers, i.e. cutting off the prompt when the user did not
> speak. This can be caused by background noise or prompt echo. The
> endpointer can also be tuned to avoid missing leading speech.
> Syllables, leading consonants, or in the worst case, everything a
> caller who speaks quietly says might be missed. Significant
> advances have been made in recent years in echo cancellation and
> endpointing techniques and these have resulted in better barge-in
> performance, and leading speech recognition companies are
> recommending the use of barge-in wherever possible.
>
> http://spotlight.ccir.ed.ac.uk/public_documents/Dialogue_design_guide/barge_in_requirements.htm
>
> Cheers,
> Matt
>
> 2010/6/9 José David Lopes <zedavid at l2f.inesc-id.pt
> <mailto:zedavid at l2f.inesc-id.pt>>
>
> Alex,
>
> Thanks for your quick answer!
>
> What do you mean by end-pointers? Is it the run mode
> configuration? I've been looking at the documentation and it is
> not clear to me what these end-pointers are.
>
> Jose
>
> Em 08-06-2010 15:31, Alex Rudnicky escreveu:
>
> Jose,
>
> There are actually three different end-pointers in Olympus;
> these are selectable in (I believe) the configuration for the
> Audio server. You might try experimenting with alternate ones.
> Also, it might be worth a try to modify the end-pointing
> sensitivity parameters.
>
> Alex
>
>
> -----Original Message-----
> From: olympus-developers-bounces at mailman.srv.cs.cmu.edu
> <mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu>
> [mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu
> <mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu>] On
> Behalf Of José David Lopes
> Sent: Tuesday, June 08, 2010 6:56 AM
> To: olympus-developers at cs.cmu.edu
> <mailto:olympus-developers at cs.cmu.edu>
> Subject: [Olympus developers 233]: Problems with VAD
>
> I'm working on a dialogue system with a different ASR module. I'm
> experiencing some problems with the VAD. This module seems very
> sensitive, triggering very easily, and thus forwarding the
> data coming
> from the microphone to the engines even if it is not speech. I was
> wondering if I could convert our own GMM models for VAD to the
> sphinx 3
> format. Is there any tool available to convert a standard
> format (for
> instance HTK) to sphinx 3 format? I could also train GMMs
> using our own
> Olympus data. Did anyone did this before?
>
> Best,
> Jose David
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100611/91cba56e/attachment.html
More information about the Olympus-developers
mailing list