[Olympus developers 237]: Re: Problems with VAD

Fri Jun 11 06:26:58 EDT 2010

Hi Matt,

Thanks for your answer.
However, my misunderstanding was where to modify the configuration file 
in order to change the end-pointing mode.

Best,
Jose

Em 09-06-2010 22:46, Matthew Marge escreveu:
> Hi Jose,
>
> This provides a good definition of endpointing as it relates to spoken 
> dialogue systems:
>
>     Endpointing is the process of determining the beginning and the
>     end of speech within an incoming sample stream. This takes into
>     account whether or not energy is detected at speech frequencies,
>     the duration of the detected sound and pitch extraction. The pitch
>     is tracked to help recognise vowels as indications of speech and
>     to filter out background noise events, such as a door closing. The
>     endpointer can be tuned to avoid false barge-in, or
>     false-triggers, i.e. cutting off the prompt when the user did not
>     speak. This can be caused by background noise or prompt echo. The
>     endpointer can also be tuned to avoid missing leading speech.
>     Syllables, leading consonants, or in the worst case, everything a
>     caller who speaks quietly says might be missed. Significant
>     advances have been made in recent years in echo cancellation and
>     endpointing techniques and these have resulted in better barge-in
>     performance, and leading speech recognition companies are
>     recommending the use of barge-in wherever possible.
>
> http://spotlight.ccir.ed.ac.uk/public_documents/Dialogue_design_guide/barge_in_requirements.htm
>
> Cheers,
> Matt
>
> 2010/6/9 José David Lopes <zedavid at l2f.inesc-id.pt 
> <mailto:zedavid at l2f.inesc-id.pt>>
>
>     Alex,
>
>     Thanks for your quick answer!
>
>     What do you mean by end-pointers? Is it the run mode
>     configuration? I've been looking at the documentation and it is
>     not clear to me what these end-pointers are.
>
>     Jose
>
>     Em 08-06-2010 15:31, Alex Rudnicky escreveu:
>
>         Jose,
>
>         There are actually three different end-pointers in Olympus;
>         these are selectable in (I believe) the configuration for the
>         Audio server. You might try experimenting with alternate ones.
>         Also, it might be worth a try to modify the end-pointing
>         sensitivity parameters.
>
>         Alex
>
>
>         -----Original Message-----
>         From: olympus-developers-bounces at mailman.srv.cs.cmu.edu
>         <mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu>
>         [mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu
>         <mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu>] On
>         Behalf Of José David Lopes
>         Sent: Tuesday, June 08, 2010 6:56 AM
>         To: olympus-developers at cs.cmu.edu
>         <mailto:olympus-developers at cs.cmu.edu>
>         Subject: [Olympus developers 233]: Problems with VAD
>
>         I'm working on a dialogue system with a different ASR module. I'm
>         experiencing some problems with the VAD. This module seems very
>         sensitive, triggering very easily, and thus forwarding the
>         data coming
>         from the microphone to the engines even if it is not speech. I was
>         wondering if I could convert our own GMM models for VAD to the
>         sphinx 3
>         format. Is there any tool available to convert a standard
>         format (for
>         instance HTK) to sphinx 3 format? I could also train GMMs
>         using our own
>         Olympus data. Did anyone did this before?
>
>         Best,
>         Jose David
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100611/91cba56e/attachment.html