[Olympus developers 219]: Re: N-best lists for PocketSphinx / Olympus

Thomas Harris tkharris at gmail.com
Tue Apr 13 12:37:47 EDT 2010


Hi Antoine,

Yes, that was/is a problem and I tried something like this. But even more
fundamental is the problem is that the p_seg_t* segment iterator that you
get from pocketsphinx doesn't correctly implement ps_seg_prob when the
segment iterator comes from the hypothesis iterator even though it works
fine if you get the segment iterator from the best_hyp function (or whatever
that's called). I've sent David the code segment that illustrates this bug.
I don't know that there's any kind of work around. For the most part we've
gotten mutiple hypotheses by running multiple recognizers, I guess.

Thanks,
-Thomas

On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux <antoine.raux at gmail.com>wrote:

> Hi all,
>
> What exactly is the confidence computation problem? Is it that we cannot
> compute the LM backoff type-based word confidence (see hyp_conf_slm in
> PocketsphinxEngine's main.cpp)?
> If that is the problem, one way to fix this might be to modify hyp_conf_slm
> to accept a ps_seg_t as an argument (instead of always getting seg_iter from
> ps_seg_iter):
>
> float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t *seg_iter = NULL)
> {
>   const int MAX_TYPE_SIZE = 4096;
>   int32 score, type[MAX_TYPE_SIZE];
>   int32 k = 0;
>
>   // (antoine) no seg_iter was given, get the top segment iterator from ps
>   if (seg_iter == NULL)
>       seg_iter = ps_seg_iter(psd, &score);
>
>       type[k++] = 3;                      // use the trigram dummy for
> first word
>
>   if (seg_iter != NULL) {
>       while (seg_iter = ps_seg_next(seg_iter)) {
>           if (k == MAX_TYPE_SIZE) return NULL;
>
>           int32 lscr, ascr;
>           ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>       }
>   }
>   type[k++] = 3; // (tk) dummy trigram after utterance
>   type[k++] = 3; // (tk) sometimes there's no end token, in which case
>                      // the list one was for the end token and this one is
> the dummy
>
>   // (antoine) allocate the array of confidence scores
>   float* conf = (float*)malloc(k*sizeof(float));
>
>   for (int32 i = 1; i < k-2; i++) {
>       if(!useFixedScore) {
>           int32 t = type[i-1] + type[i] + ((type[i+1] + type[i+2])<<1); //
> (tk) wtf?
>           conf[i-1] = (float)((double)(t-6)/12.0);
>       } else {
>           conf[i-1] = 0.7f;
>       }
>   }
>
>   return conf;
> }
>
> Then further down, you can modify the third version of fillPartialHypStruct
> by just adding the argument when it calls hyp_conf_slm:
>
> // [2008-02-19] (antoine): this function takes a partial hypothesis and a
> reference to a
> //                        THypStruct and fills in the hyp struct
> void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct* phs, int
> fromNBest) {
>
>   Log(STD_STREAM, "Filling partial hyp struct\n");
>
>   size_t h_len, ch_len;
>   int n_words = 0, n_validwords, has_oov;
>   char tmp[16384];
>   float *lm_conf = NULL;
>
>   // Fill in confidence values for words in result and build filtered
> hypothesis
>   if (slm)
>       lm_conf = hyp_conf_slm(curr_seg_iter);
>   else
>       lm_conf = hyp_conf_slm(curr_seg_iter, true);
>
> (...)
>
> I don't really have any setup to test this but if someone who has could
> give it a shot and post the result to the mailing list...
> Now it might be that I misunderstood what the problem was altogether (in
> which case I apologize for the spam)...
>
> On a side note, the big commented out block in getHypStructs (as sent by
> Blaise) is from my Cactus code (which I had sent to Blaise as an example),
> so it's irrelevant to Olympus and should be deleted (for clarity's sake).
>
> antoine
>
> Blaise Thomson wrote:
>
>> Hi Thomas / Alan,
>>
>> I've now got some preliminary N-best list code to work with PocketSphinx.
>> With the help of  some example code from Antoine I've modified the
>> pocketsphinx engine to produce a 1-best list for partial recognition results
>> but an N-best list upon completion. I've also modified the AudioServer to be
>> able to receive multiple N-best lists from each of the recognizer (the
>> number for each decoder specified by an optional ":N" after the decoder
>> definition in the config file). In case this may be something you want to
>> include in future versions of Olympus I've attached my modified files.
>>
>> Note, however, that the code still doesn't produce any confidence score
>> information for the N-best list. For this reason we will still probably be
>> unable to use Olympus for our version of the LetsGo! system. If the
>> PocketSphinx bugs you mentioned are fixed any time soon or if anyone finds
>> out how to get confidence scores with the N-best list would you please let
>> us know?
>>
>> Many thanks,
>> Blaise
>>
>>
>>
>> Thomas Harris wrote:
>>
>>> Hi Blaise,
>>>
>>> Thanks for looking into this. I hope we can include your bugfixes. I've
>>> been looking into this as well, and there's a more fundamental issue. It
>>> seems like you can't get word confidence metrics from the PocketSphinx
>>> segment iterators when you've gotten the sement iterators from the n_best
>>> hypothisis iterator. It smells like a PocketSphinx bug, but I haven't seen
>>> any reference implementation of PocketSphinx that makes use of those
>>> confidence metrics in an n_best setting, so I'm not sure that it isn't a
>>> problem with how the PocketSphinx api is used. Until that issue is resolved
>>> n_best lists won't work in Olympus, too many downhill processes depend on
>>> those confidence metrics.
>>>
>>> Thanks,
>>> -Thomas
>>>
>>> On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson <brmt2 at cam.ac.uk<mailto:
>>> brmt2 at cam.ac.uk>> wrote:
>>>
>>>    Dear Olympus developers,
>>>
>>>    I am trying to get the Olympus LetsGo! system to provide an N-best
>>>    list of speech recognition hypotheses. I found the -n_best switch
>>>    which can be passed to the PocketSphinxEngine which is supposed to
>>>    enable this but when I set the switch to anything other than 0 the
>>>    system crashes immediately on any audio input. I remember you said
>>>    that the system had been build to provide N-best lists so I was
>>>    wondering if you could give any advice on why it is not working.
>>>    Do you have a working N-best list system that you could send me to
>>>    see how things are configured?
>>>
>>>    In trying to solve the problem I took a look at the
>>>    PocketSphinxEngine source code and have noticed some possible
>>>    memory access bugs which may be contributing to this. These were
>>>    related to the way the iHypsGenerated variable was used. I've
>>>    fixed these and can send them if you would like (I tried attaching
>>>    them but the mailing list won't let me). The resulting code still
>>>    crashes but at a later stage. After the fix, the log file
>>>    generates a WARNING: "ngram_search.c", line 1000:. I don't know if
>>>    this might be the cause of the problem. There is also a
>>>    possibility that I simply have to add a configuration variable to
>>>    PocketSphinx itself. At the moment I have only used the n_best
>>>    switch on PocketSphinxEngine.
>>>
>>>    Please do let me know if you have any ideas of how to get this
>>>    working or who else to contact.
>>>
>>>    Thanks for all you help,
>>>
>>>    Blaise
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100413/d045b3e1/attachment.html


More information about the Olympus-developers mailing list