[Olympus developers 267]: Re: speech level ("volume")

Alex Rudnicky Alex.Rudnicky at cs.cmu.edu
Thu Nov 18 16:59:12 EST 2010


The VAD/endpointer considers the delta between the background noise and
the speech level; when it's big enough, speech is "detected". This
actually deals with the passing-bus problem, if the talker raises their
volume to compensate.

But there's any number of reasons why someone might speaker louder.

It might be better to key synthesis volume to fNoiseLevel, and try to
always keep it a certain value above the imputed background noise (say
15-20dN).

Alex


-----Original Message-----
From: olympus-developers-bounces at mailman.srv.cs.cmu.edu
[mailto:olympus-developers-bounces at mailman.srv.cs.cmu.edu] On Behalf Of
Gabriel Parent
Sent: Thursday, November 18, 2010 4:05 PM
To: olympus-developers at mailman.srv.cs.cmu.edu
Subject: [Olympus developers 265]: speech level ("volume")

Hi,

I'm building a model to have my SDS adapt to the user.  I need the 
"volume" feature: for example, if a user starts speaking louder, there 
might be a bus passing by and thus we need to increase the volume of the

synthesizer (it's a bad example, anyway you get the point).

Looking into the AudioServer, I found that the GMM VAD updates two 
variables : fNoiseLevel and fSpeechLevel.  I don't understand exactly 
how the GMM VAD works, but it seems it estimates the speech level using 
an histogram.  Would it be safe to use fSpeechLevel as my "volume" 
feature?  I tried going from whispering to shouting, and that value went

from 9 to 17.  Just want to make sure I'm not making a dumb mistake.

Cheers,
Gabriel



More information about the Olympus-developers mailing list