[Olympus developers 266]: Re: speech level ("volume")

Thu Nov 18 16:57:46 EST 2010

Yeah you got it right. The VAD assumes that the incoming audio is a 
mixture of background noise and speech so it looks for two peeks of the 
histogram, the lowest (quietest) one is noise, the highest is speech. So 
fSpeechLevel should be a good estimate of how loud the system hears the 
user (which is related to how loud they speak, how close they are to 
their phone, what phone they are using....).

antoine

On 11/18/2010 1:05 PM, Gabriel Parent wrote:
> Hi,
>
> I'm building a model to have my SDS adapt to the user.  I need the 
> "volume" feature: for example, if a user starts speaking louder, there 
> might be a bus passing by and thus we need to increase the volume of 
> the synthesizer (it's a bad example, anyway you get the point).
>
> Looking into the AudioServer, I found that the GMM VAD updates two 
> variables : fNoiseLevel and fSpeechLevel.  I don't understand exactly 
> how the GMM VAD works, but it seems it estimates the speech level 
> using an histogram.  Would it be safe to use fSpeechLevel as my 
> "volume" feature?  I tried going from whispering to shouting, and that 
> value went from 9 to 17.  Just want to make sure I'm not making a dumb 
> mistake.
>
> Cheers,
> Gabriel
>