[Olympus developers 225]: Re: N-best lists for PocketSphinx / Olympus

Tue Apr 13 16:41:17 EDT 2010

As far as I can tell by looking at the code, PocketsphinxEngine does not 
send any word-level confidence information to AudioServer. That's good 
news. The bad news is that it computes utterance-level AM and LM scores 
by summing those of the words (so it needs word-level scores for 
that...). This seems the only way to get separate AM and LM score from 
Pocketsphinx (someone, correct me if I'm wrong). You can get the overall 
score though (the sum of the the AM and LM scores), using ps_nbest_hyp. 
So we could recompute the LM score of the i-th hypothesis using the 
Sphinxbase's ngram_tg_score routine (at the same time as getting the LM 
backoff type) and subtract the LM score from the total score to get the 
AM score (after potentially weighting by the LM weight, although the LM 
score might already include that, I'm not sure)...

All in all, it's not that hard but I'm not even sure the overall AM 
score is used anywhere (I think it can be used when computing confidence 
with Helios but that depends on your model. E.g., the LetsGoPublic model 
does not use that feature).
The only question is: is there someone motivated enough to do this? ;)

antoine

Thomas Harris wrote:
> I have seen in the frame some statistics like #words>p where p is some 
> kind of confidence metric. It would seem that those statistics can 
> only be derived from word-level confidence scores in the recognizer. 
> But that's just my recollection right now. Someone should investigate, 
> I guess.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 1:20 PM, Antoine Raux <antoine.raux at gmail.com 
> <mailto:antoine.raux at gmail.com>> wrote:
>
>     Hmmm... I don't know about word-level acoustic scores. I don't
>     remember ever using these. Are these actually stored in the frame
>     that PocketsphinxEngine and/or AudioServer send?
>
>     antoine
>
>     Thomas Harris wrote:
>
>         Good idea. That sounds doable. But we're also not getting
>         acoustic model scores. I don't know for sure, do latter stages
>         of Olympus (like Helios) depend on word-level acoustic scores?
>
>         -Thomas
>
>         On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux
>         <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>         <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>> wrote:
>
>            Actually, rather than a bunch of ifs as I wrote, the (still
>            temporary) solution is to get the ngram_model_t object from ps,
>            and then use the sphinxbase functions (such as
>         ngram_tg_score) to
>            compute the backoff type (which is exactly what ps does at
>            decoding time).
>
>            antoine
>
>            Thomas Harris wrote:
>
>                Hi Antoine,
>
>                Yes, that was/is a problem and I tried something like this.
>                But even more fundamental is the problem is that the
>         p_seg_t*
>                segment iterator that you get from pocketsphinx doesn't
>                correctly implement ps_seg_prob when the segment iterator
>                comes from the hypothesis iterator even though it works
>         fine
>                if you get the segment iterator from the best_hyp
>         function (or
>                whatever that's called). I've sent David the code
>         segment that
>                illustrates this bug. I don't know that there's any kind of
>                work around. For the most part we've gotten mutiple
>         hypotheses
>                by running multiple recognizers, I guess.
>
>                Thanks,
>                -Thomas
>
>                On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux
>                <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>         <mailto:antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>>> wrote:
>
>                   Hi all,
>
>                   What exactly is the confidence computation problem?
>         Is it
>                that we
>                   cannot compute the LM backoff type-based word
>         confidence (see
>                   hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>                   If that is the problem, one way to fix this might be
>         to modify
>                   hyp_conf_slm to accept a ps_seg_t as an argument
>         (instead of
>                   always getting seg_iter from ps_seg_iter):
>
>                   float* hyp_conf_slm (bool useFixedScore = false,
>         ps_seg_t
>                   *seg_iter = NULL)
>                   {
>                     const int MAX_TYPE_SIZE = 4096;
>                     int32 score, type[MAX_TYPE_SIZE];
>                     int32 k = 0;
>
>                     // (antoine) no seg_iter was given, get the top
>         segment
>                iterator
>                   from ps
>                     if (seg_iter == NULL)
>                         seg_iter = ps_seg_iter(psd, &score);
>
>                         type[k++] = 3;                      // use the
>                trigram dummy
>                   for first word
>
>                     if (seg_iter != NULL) {
>                         while (seg_iter = ps_seg_next(seg_iter)) {
>                             if (k == MAX_TYPE_SIZE) return NULL;
>
>                             int32 lscr, ascr;
>                             ps_seg_prob(seg_iter, &ascr, &lscr,
>         &type[k++]);
>                         }
>                     }
>                     type[k++] = 3; // (tk) dummy trigram after utterance
>                     type[k++] = 3; // (tk) sometimes there's no end
>         token, in
>                which case
>                                        // the list one was for the end
>         token and
>                   this one is the dummy
>
>                     // (antoine) allocate the array of confidence scores
>                     float* conf = (float*)malloc(k*sizeof(float));
>
>                     for (int32 i = 1; i < k-2; i++) {
>                         if(!useFixedScore) {
>                             int32 t = type[i-1] + type[i] + ((type[i+1] +
>                   type[i+2])<<1); // (tk) wtf?
>                             conf[i-1] = (float)((double)(t-6)/12.0);
>                         } else {
>                             conf[i-1] = 0.7f;
>                         }
>                     }
>
>                     return conf;
>                   }
>
>                   Then further down, you can modify the third version of
>                   fillPartialHypStruct by just adding the argument
>         when it calls
>                   hyp_conf_slm:
>
>                   // [2008-02-19] (antoine): this function takes a partial
>                   hypothesis and a reference to a
>                   //                        THypStruct and fills in
>         the hyp
>                struct
>                   void fillPartialHypStruct(ps_seg_t* curr_seg_iter,
>         THypStruct*
>                   phs, int fromNBest) {
>
>                     Log(STD_STREAM, "Filling partial hyp struct\n");
>
>                     size_t h_len, ch_len;
>                     int n_words = 0, n_validwords, has_oov;
>                     char tmp[16384];
>                     float *lm_conf = NULL;
>
>                     // Fill in confidence values for words in result
>         and build
>                   filtered hypothesis
>                     if (slm)
>                         lm_conf = hyp_conf_slm(curr_seg_iter);
>                     else
>                         lm_conf = hyp_conf_slm(curr_seg_iter, true);
>
>                   (...)
>
>                   I don't really have any setup to test this but if
>         someone
>                who has
>                   could give it a shot and post the result to the
>         mailing list...
>                   Now it might be that I misunderstood what the
>         problem was
>                   altogether (in which case I apologize for the spam)...
>
>                   On a side note, the big commented out block in
>                getHypStructs (as
>                   sent by Blaise) is from my Cactus code (which I had
>         sent to
>                Blaise
>                   as an example), so it's irrelevant to Olympus and
>         should be
>                   deleted (for clarity's sake).
>
>                   antoine
>
>                   Blaise Thomson wrote:
>
>                       Hi Thomas / Alan,
>
>                       I've now got some preliminary N-best list code
>         to work with
>                       PocketSphinx. With the help of  some example
>         code from
>                Antoine
>                       I've modified the pocketsphinx engine to produce a
>                1-best list
>                       for partial recognition results but an N-best
>         list upon
>                       completion. I've also modified the AudioServer to be
>                able to
>                       receive multiple N-best lists from each of the
>                recognizer (the
>                       number for each decoder specified by an optional
>         ":N" after
>                       the decoder definition in the config file). In case
>                this may
>                       be something you want to include in future
>         versions of
>                Olympus
>                       I've attached my modified files.
>
>                       Note, however, that the code still doesn't
>         produce any
>                       confidence score information for the N-best
>         list. For this
>                       reason we will still probably be unable to use
>         Olympus
>                for our
>                       version of the LetsGo! system. If the
>         PocketSphinx bugs you
>                       mentioned are fixed any time soon or if anyone finds
>                out how
>                       to get confidence scores with the N-best list
>         would you
>                please
>                       let us know?
>
>                       Many thanks,
>                       Blaise
>
>
>
>                       Thomas Harris wrote:
>
>                           Hi Blaise,
>
>                           Thanks for looking into this. I hope we can
>         include
>                your
>                           bugfixes. I've been looking into this as
>         well, and
>                there's
>                           a more fundamental issue. It seems like you
>         can't
>                get word
>                           confidence metrics from the PocketSphinx segment
>                iterators
>                           when you've gotten the sement iterators from
>         the n_best
>                           hypothisis iterator. It smells like a
>         PocketSphinx bug,
>                           but I haven't seen any reference
>         implementation of
>                           PocketSphinx that makes use of those confidence
>                metrics in
>                           an n_best setting, so I'm not sure that it
>         isn't a
>                problem
>                           with how the PocketSphinx api is used. Until
>         that
>                issue is
>                           resolved n_best lists won't work in Olympus,
>         too many
>                           downhill processes depend on those
>         confidence metrics.
>
>                           Thanks,
>                           -Thomas
>
>                           On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>                           <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>
>                           <mailto:brmt2 at cam.ac.uk
>         <mailto:brmt2 at cam.ac.uk> <mailto:brmt2 at cam.ac.uk
>         <mailto:brmt2 at cam.ac.uk>>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>>> wrote:
>
>                              Dear Olympus developers,
>
>                              I am trying to get the Olympus LetsGo!
>         system to
>                           provide an N-best
>                              list of speech recognition hypotheses. I
>         found the
>                           -n_best switch
>                              which can be passed to the PocketSphinxEngine
>                which is
>                           supposed to
>                              enable this but when I set the switch to
>                anything other
>                           than 0 the
>                              system crashes immediately on any audio
>         input. I
>                           remember you said
>                              that the system had been build to provide
>         N-best
>                lists
>                           so I was
>                              wondering if you could give any advice on
>         why it
>                is not
>                           working.
>                              Do you have a working N-best list system that
>                you could
>                           send me to
>                              see how things are configured?
>
>                              In trying to solve the problem I took a
>         look at the
>                              PocketSphinxEngine source code and have
>         noticed some
>                           possible
>                              memory access bugs which may be
>         contributing to
>                this.
>                           These were
>                              related to the way the iHypsGenerated
>         variable was
>                           used. I've
>                              fixed these and can send them if you
>         would like (I
>                           tried attaching
>                              them but the mailing list won't let me). The
>                resulting
>                           code still
>                              crashes but at a later stage. After the
>         fix, the
>                log file
>                              generates a WARNING: "ngram_search.c",
>         line 1000:. I
>                           don't know if
>                              this might be the cause of the problem.
>         There is
>                also a
>                              possibility that I simply have to add a
>                configuration
>                           variable to
>                              PocketSphinx itself. At the moment I have
>         only
>                used the
>                           n_best
>                              switch on PocketSphinxEngine.
>
>                              Please do let me know if you have any
>         ideas of
>                how to
>                           get this
>                              working or who else to contact.
>
>                              Thanks for all you help,
>
>                              Blaise
>
>
>
>
>
>
>
>
>
>