[Olympus developers 236]: Re: Problems with VAD

Wed Jun 9 17:46:47 EDT 2010

Hi Jose,

This provides a good definition of endpointing as it relates to spoken
dialogue systems:

 Endpointing is the process of determining the beginning and the end of
> speech within an incoming sample stream. This takes into account whether or
> not energy is detected at speech frequencies, the duration of the detected
> sound and pitch extraction. The pitch is tracked to help recognise vowels as
> indications of speech and to filter out background noise events, such as a
> door closing. The endpointer can be tuned to avoid false barge-in, or
> false-triggers, i.e. cutting off the prompt when the user did not speak.
> This can be caused by background noise or prompt echo. The endpointer can
> also be tuned to avoid missing leading speech. Syllables, leading
> consonants, or in the worst case, everything a caller who speaks quietly
> says might be missed. Significant advances have been made in recent years in
> echo cancellation and endpointing techniques and these have resulted in
> better barge-in performance, and leading speech recognition companies are
> recommending the use of barge-in wherever possible.

http://spotlight.ccir.ed.ac.uk/public_documents/Dialogue_design_guide/barge_in_requirements.htm

Cheers,
Matt

2010/6/9 José David Lopes <zedavid at l2f.inesc-id.pt>

> Alex,
>
> Thanks for your quick answer!
>
> What do you mean by end-pointers? Is it the run mode configuration? I've
> been looking at the documentation and it is not clear to me what these
> end-pointers are.
>
> Jose
>
> Em 08-06-2010 15:31, Alex Rudnicky escreveu:
>
>  Jose,
>>
>> There are actually three different end-pointers in Olympus; these are
>> selectable in (I believe) the configuration for the Audio server. You might
>> try experimenting with alternate ones. Also, it might be worth a try to
>> modify the end-pointing sensitivity parameters.
>>
>> Alex
>>
>>
>> -----Original Message-----
>> From: olympus-developers-bounces at mailman.srv.cs.cmu.edu [mailto:
>> olympus-developers-bounces at mailman.srv.cs.cmu.edu] On Behalf Of José
>> David Lopes
>> Sent: Tuesday, June 08, 2010 6:56 AM
>> To: olympus-developers at cs.cmu.edu
>> Subject: [Olympus developers 233]: Problems with VAD
>>
>> I'm working on a dialogue system with a different ASR module. I'm
>> experiencing some problems with the VAD. This module seems very
>> sensitive, triggering very easily, and thus forwarding the data coming
>> from the microphone to the engines even if it is not speech. I was
>> wondering if I could convert our own GMM models for VAD to the sphinx 3
>> format. Is there any tool available to convert a standard format (for
>> instance HTK) to sphinx 3 format? I could also train GMMs using our own
>> Olympus data. Did anyone did this before?
>>
>> Best,
>> Jose David
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100609/a8d18b15/attachment-0001.html