[Olympus developers 221]: Re: N-best lists for PocketSphinx / Olympus

Tue Apr 13 13:12:22 EDT 2010

Actually, rather than a bunch of ifs as I wrote, the (still temporary) 
solution is to get the ngram_model_t object from ps, and then use the 
sphinxbase functions (such as ngram_tg_score) to compute the backoff 
type (which is exactly what ps does at decoding time).

antoine

Thomas Harris wrote:
> Hi Antoine,
>
> Yes, that was/is a problem and I tried something like this. But even 
> more fundamental is the problem is that the p_seg_t* segment iterator 
> that you get from pocketsphinx doesn't correctly implement ps_seg_prob 
> when the segment iterator comes from the hypothesis iterator even 
> though it works fine if you get the segment iterator from the best_hyp 
> function (or whatever that's called). I've sent David the code segment 
> that illustrates this bug. I don't know that there's any kind of work 
> around. For the most part we've gotten mutiple hypotheses by running 
> multiple recognizers, I guess.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux <antoine.raux at gmail.com 
> <mailto:antoine.raux at gmail.com>> wrote:
>
>     Hi all,
>
>     What exactly is the confidence computation problem? Is it that we
>     cannot compute the LM backoff type-based word confidence (see
>     hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>     If that is the problem, one way to fix this might be to modify
>     hyp_conf_slm to accept a ps_seg_t as an argument (instead of
>     always getting seg_iter from ps_seg_iter):
>
>     float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t
>     *seg_iter = NULL)
>     {
>       const int MAX_TYPE_SIZE = 4096;
>       int32 score, type[MAX_TYPE_SIZE];
>       int32 k = 0;
>
>       // (antoine) no seg_iter was given, get the top segment iterator
>     from ps
>       if (seg_iter == NULL)
>           seg_iter = ps_seg_iter(psd, &score);
>
>           type[k++] = 3;                      // use the trigram dummy
>     for first word
>
>       if (seg_iter != NULL) {
>           while (seg_iter = ps_seg_next(seg_iter)) {
>               if (k == MAX_TYPE_SIZE) return NULL;
>
>               int32 lscr, ascr;
>               ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>           }
>       }
>       type[k++] = 3; // (tk) dummy trigram after utterance
>       type[k++] = 3; // (tk) sometimes there's no end token, in which case
>                          // the list one was for the end token and
>     this one is the dummy
>
>       // (antoine) allocate the array of confidence scores
>       float* conf = (float*)malloc(k*sizeof(float));
>
>       for (int32 i = 1; i < k-2; i++) {
>           if(!useFixedScore) {
>               int32 t = type[i-1] + type[i] + ((type[i+1] +
>     type[i+2])<<1); // (tk) wtf?
>               conf[i-1] = (float)((double)(t-6)/12.0);
>           } else {
>               conf[i-1] = 0.7f;
>           }
>       }
>
>       return conf;
>     }
>
>     Then further down, you can modify the third version of
>     fillPartialHypStruct by just adding the argument when it calls
>     hyp_conf_slm:
>
>     // [2008-02-19] (antoine): this function takes a partial
>     hypothesis and a reference to a
>     //                        THypStruct and fills in the hyp struct
>     void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*
>     phs, int fromNBest) {
>
>       Log(STD_STREAM, "Filling partial hyp struct\n");
>
>       size_t h_len, ch_len;
>       int n_words = 0, n_validwords, has_oov;
>       char tmp[16384];
>       float *lm_conf = NULL;
>
>       // Fill in confidence values for words in result and build
>     filtered hypothesis
>       if (slm)
>           lm_conf = hyp_conf_slm(curr_seg_iter);
>       else
>           lm_conf = hyp_conf_slm(curr_seg_iter, true);
>
>     (...)
>
>     I don't really have any setup to test this but if someone who has
>     could give it a shot and post the result to the mailing list...
>     Now it might be that I misunderstood what the problem was
>     altogether (in which case I apologize for the spam)...
>
>     On a side note, the big commented out block in getHypStructs (as
>     sent by Blaise) is from my Cactus code (which I had sent to Blaise
>     as an example), so it's irrelevant to Olympus and should be
>     deleted (for clarity's sake).
>
>     antoine
>
>     Blaise Thomson wrote:
>
>         Hi Thomas / Alan,
>
>         I've now got some preliminary N-best list code to work with
>         PocketSphinx. With the help of  some example code from Antoine
>         I've modified the pocketsphinx engine to produce a 1-best list
>         for partial recognition results but an N-best list upon
>         completion. I've also modified the AudioServer to be able to
>         receive multiple N-best lists from each of the recognizer (the
>         number for each decoder specified by an optional ":N" after
>         the decoder definition in the config file). In case this may
>         be something you want to include in future versions of Olympus
>         I've attached my modified files.
>
>         Note, however, that the code still doesn't produce any
>         confidence score information for the N-best list. For this
>         reason we will still probably be unable to use Olympus for our
>         version of the LetsGo! system. If the PocketSphinx bugs you
>         mentioned are fixed any time soon or if anyone finds out how
>         to get confidence scores with the N-best list would you please
>         let us know?
>
>         Many thanks,
>         Blaise
>
>
>
>         Thomas Harris wrote:
>
>             Hi Blaise,
>
>             Thanks for looking into this. I hope we can include your
>             bugfixes. I've been looking into this as well, and there's
>             a more fundamental issue. It seems like you can't get word
>             confidence metrics from the PocketSphinx segment iterators
>             when you've gotten the sement iterators from the n_best
>             hypothisis iterator. It smells like a PocketSphinx bug,
>             but I haven't seen any reference implementation of
>             PocketSphinx that makes use of those confidence metrics in
>             an n_best setting, so I'm not sure that it isn't a problem
>             with how the PocketSphinx api is used. Until that issue is
>             resolved n_best lists won't work in Olympus, too many
>             downhill processes depend on those confidence metrics.
>
>             Thanks,
>             -Thomas
>
>             On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>             <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>             <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>> wrote:
>
>                Dear Olympus developers,
>
>                I am trying to get the Olympus LetsGo! system to
>             provide an N-best
>                list of speech recognition hypotheses. I found the
>             -n_best switch
>                which can be passed to the PocketSphinxEngine which is
>             supposed to
>                enable this but when I set the switch to anything other
>             than 0 the
>                system crashes immediately on any audio input. I
>             remember you said
>                that the system had been build to provide N-best lists
>             so I was
>                wondering if you could give any advice on why it is not
>             working.
>                Do you have a working N-best list system that you could
>             send me to
>                see how things are configured?
>
>                In trying to solve the problem I took a look at the
>                PocketSphinxEngine source code and have noticed some
>             possible
>                memory access bugs which may be contributing to this.
>             These were
>                related to the way the iHypsGenerated variable was
>             used. I've
>                fixed these and can send them if you would like (I
>             tried attaching
>                them but the mailing list won't let me). The resulting
>             code still
>                crashes but at a later stage. After the fix, the log file
>                generates a WARNING: "ngram_search.c", line 1000:. I
>             don't know if
>                this might be the cause of the problem. There is also a
>                possibility that I simply have to add a configuration
>             variable to
>                PocketSphinx itself. At the moment I have only used the
>             n_best
>                switch on PocketSphinxEngine.
>
>                Please do let me know if you have any ideas of how to
>             get this
>                working or who else to contact.
>
>                Thanks for all you help,
>
>                Blaise
>
>
>
>
>
>