[Olympus developers 227]: Re: N-best lists for PocketSphinx / Olympus

Tue Apr 13 19:42:04 EDT 2010

Point taken. I posted a message about this on the Sphinx Help forum...

antoine

Thomas Harris wrote:
> These fixes, as described, could be applied directly to the 
> PocketSphinx API, and could be a benefit to the larger community of 
> PocketSphinx users, so I think that the issue of motivation could be 
> raised there as well. This is especially the case since it's unclear 
> whether the current PocketSphinx API accurately reflects the actually 
> functionality of the system.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 4:41 PM, Antoine Raux <antoine.raux at gmail.com 
> <mailto:antoine.raux at gmail.com>> wrote:
>
>     As far as I can tell by looking at the code, PocketsphinxEngine
>     does not send any word-level confidence information to
>     AudioServer. That's good news. The bad news is that it computes
>     utterance-level AM and LM scores by summing those of the words (so
>     it needs word-level scores for that...). This seems the only way
>     to get separate AM and LM score from Pocketsphinx (someone,
>     correct me if I'm wrong). You can get the overall score though
>     (the sum of the the AM and LM scores), using ps_nbest_hyp. So we
>     could recompute the LM score of the i-th hypothesis using the
>     Sphinxbase's ngram_tg_score routine (at the same time as getting
>     the LM backoff type) and subtract the LM score from the total
>     score to get the AM score (after potentially weighting by the LM
>     weight, although the LM score might already include that, I'm not
>     sure)...
>
>     All in all, it's not that hard but I'm not even sure the overall
>     AM score is used anywhere (I think it can be used when computing
>     confidence with Helios but that depends on your model. E.g., the
>     LetsGoPublic model does not use that feature).
>     The only question is: is there someone motivated enough to do this? ;)
>
>     antoine
>
>
>     Thomas Harris wrote:
>
>         I have seen in the frame some statistics like #words>p where p
>         is some kind of confidence metric. It would seem that those
>         statistics can only be derived from word-level confidence
>         scores in the recognizer. But that's just my recollection
>         right now. Someone should investigate, I guess.
>
>         Thanks,
>         -Thomas
>
>
>         On Tue, Apr 13, 2010 at 1:20 PM, Antoine Raux
>         <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>         <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>> wrote:
>
>            Hmmm... I don't know about word-level acoustic scores. I don't
>            remember ever using these. Are these actually stored in the
>         frame
>            that PocketsphinxEngine and/or AudioServer send?
>
>            antoine
>
>            Thomas Harris wrote:
>
>                Good idea. That sounds doable. But we're also not getting
>                acoustic model scores. I don't know for sure, do latter
>         stages
>                of Olympus (like Helios) depend on word-level acoustic
>         scores?
>
>                -Thomas
>
>                On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux
>                <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>         <mailto:antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>>> wrote:
>
>                   Actually, rather than a bunch of ifs as I wrote, the
>         (still
>                   temporary) solution is to get the ngram_model_t
>         object from ps,
>                   and then use the sphinxbase functions (such as
>                ngram_tg_score) to
>                   compute the backoff type (which is exactly what ps
>         does at
>                   decoding time).
>
>                   antoine
>
>                   Thomas Harris wrote:
>
>                       Hi Antoine,
>
>                       Yes, that was/is a problem and I tried something
>         like this.
>                       But even more fundamental is the problem is that the
>                p_seg_t*
>                       segment iterator that you get from pocketsphinx
>         doesn't
>                       correctly implement ps_seg_prob when the segment
>         iterator
>                       comes from the hypothesis iterator even though
>         it works
>                fine
>                       if you get the segment iterator from the best_hyp
>                function (or
>                       whatever that's called). I've sent David the code
>                segment that
>                       illustrates this bug. I don't know that there's
>         any kind of
>                       work around. For the most part we've gotten mutiple
>                hypotheses
>                       by running multiple recognizers, I guess.
>
>                       Thanks,
>                       -Thomas
>
>                       On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux
>                       <antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com> <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com> <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>>
>                       <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>
>                       <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>
>                <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>>>> wrote:
>
>                          Hi all,
>
>                          What exactly is the confidence computation
>         problem?
>                Is it
>                       that we
>                          cannot compute the LM backoff type-based word
>                confidence (see
>                          hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>                          If that is the problem, one way to fix this
>         might be
>                to modify
>                          hyp_conf_slm to accept a ps_seg_t as an argument
>                (instead of
>                          always getting seg_iter from ps_seg_iter):
>
>                          float* hyp_conf_slm (bool useFixedScore = false,
>                ps_seg_t
>                          *seg_iter = NULL)
>                          {
>                            const int MAX_TYPE_SIZE = 4096;
>                            int32 score, type[MAX_TYPE_SIZE];
>                            int32 k = 0;
>
>                            // (antoine) no seg_iter was given, get the top
>                segment
>                       iterator
>                          from ps
>                            if (seg_iter == NULL)
>                                seg_iter = ps_seg_iter(psd, &score);
>
>                                type[k++] = 3;                      //
>         use the
>                       trigram dummy
>                          for first word
>
>                            if (seg_iter != NULL) {
>                                while (seg_iter = ps_seg_next(seg_iter)) {
>                                    if (k == MAX_TYPE_SIZE) return NULL;
>
>                                    int32 lscr, ascr;
>                                    ps_seg_prob(seg_iter, &ascr, &lscr,
>                &type[k++]);
>                                }
>                            }
>                            type[k++] = 3; // (tk) dummy trigram after
>         utterance
>                            type[k++] = 3; // (tk) sometimes there's no end
>                token, in
>                       which case
>                                               // the list one was for
>         the end
>                token and
>                          this one is the dummy
>
>                            // (antoine) allocate the array of
>         confidence scores
>                            float* conf = (float*)malloc(k*sizeof(float));
>
>                            for (int32 i = 1; i < k-2; i++) {
>                                if(!useFixedScore) {
>                                    int32 t = type[i-1] + type[i] +
>         ((type[i+1] +
>                          type[i+2])<<1); // (tk) wtf?
>                                    conf[i-1] =
>         (float)((double)(t-6)/12.0);
>                                } else {
>                                    conf[i-1] = 0.7f;
>                                }
>                            }
>
>                            return conf;
>                          }
>
>                          Then further down, you can modify the third
>         version of
>                          fillPartialHypStruct by just adding the argument
>                when it calls
>                          hyp_conf_slm:
>
>                          // [2008-02-19] (antoine): this function
>         takes a partial
>                          hypothesis and a reference to a
>                          //                        THypStruct and fills in
>                the hyp
>                       struct
>                          void fillPartialHypStruct(ps_seg_t*
>         curr_seg_iter,
>                THypStruct*
>                          phs, int fromNBest) {
>
>                            Log(STD_STREAM, "Filling partial hyp
>         struct\n");
>
>                            size_t h_len, ch_len;
>                            int n_words = 0, n_validwords, has_oov;
>                            char tmp[16384];
>                            float *lm_conf = NULL;
>
>                            // Fill in confidence values for words in
>         result
>                and build
>                          filtered hypothesis
>                            if (slm)
>                                lm_conf = hyp_conf_slm(curr_seg_iter);
>                            else
>                                lm_conf = hyp_conf_slm(curr_seg_iter,
>         true);
>
>                          (...)
>
>                          I don't really have any setup to test this but if
>                someone
>                       who has
>                          could give it a shot and post the result to the
>                mailing list...
>                          Now it might be that I misunderstood what the
>                problem was
>                          altogether (in which case I apologize for the
>         spam)...
>
>                          On a side note, the big commented out block in
>                       getHypStructs (as
>                          sent by Blaise) is from my Cactus code (which
>         I had
>                sent to
>                       Blaise
>                          as an example), so it's irrelevant to Olympus and
>                should be
>                          deleted (for clarity's sake).
>
>                          antoine
>
>                          Blaise Thomson wrote:
>
>                              Hi Thomas / Alan,
>
>                              I've now got some preliminary N-best list
>         code
>                to work with
>                              PocketSphinx. With the help of  some example
>                code from
>                       Antoine
>                              I've modified the pocketsphinx engine to
>         produce a
>                       1-best list
>                              for partial recognition results but an N-best
>                list upon
>                              completion. I've also modified the
>         AudioServer to be
>                       able to
>                              receive multiple N-best lists from each
>         of the
>                       recognizer (the
>                              number for each decoder specified by an
>         optional
>                ":N" after
>                              the decoder definition in the config
>         file). In case
>                       this may
>                              be something you want to include in future
>                versions of
>                       Olympus
>                              I've attached my modified files.
>
>                              Note, however, that the code still doesn't
>                produce any
>                              confidence score information for the N-best
>                list. For this
>                              reason we will still probably be unable
>         to use
>                Olympus
>                       for our
>                              version of the LetsGo! system. If the
>                PocketSphinx bugs you
>                              mentioned are fixed any time soon or if
>         anyone finds
>                       out how
>                              to get confidence scores with the N-best list
>                would you
>                       please
>                              let us know?
>
>                              Many thanks,
>                              Blaise
>
>
>
>                              Thomas Harris wrote:
>
>                                  Hi Blaise,
>
>                                  Thanks for looking into this. I hope
>         we can
>                include
>                       your
>                                  bugfixes. I've been looking into this as
>                well, and
>                       there's
>                                  a more fundamental issue. It seems
>         like you
>                can't
>                       get word
>                                  confidence metrics from the
>         PocketSphinx segment
>                       iterators
>                                  when you've gotten the sement
>         iterators from
>                the n_best
>                                  hypothisis iterator. It smells like a
>                PocketSphinx bug,
>                                  but I haven't seen any reference
>                implementation of
>                                  PocketSphinx that makes use of those
>         confidence
>                       metrics in
>                                  an n_best setting, so I'm not sure
>         that it
>                isn't a
>                       problem
>                                  with how the PocketSphinx api is
>         used. Until
>                that
>                       issue is
>                                  resolved n_best lists won't work in
>         Olympus,
>                too many
>                                  downhill processes depend on those
>                confidence metrics.
>
>                                  Thanks,
>                                  -Thomas
>
>                                  On Wed, Mar 24, 2010 at 4:39 AM,
>         Blaise Thomson
>                                  <brmt2 at cam.ac.uk
>         <mailto:brmt2 at cam.ac.uk> <mailto:brmt2 at cam.ac.uk
>         <mailto:brmt2 at cam.ac.uk>>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>
>                       <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>>
>                                  <mailto:brmt2 at cam.ac.uk
>         <mailto:brmt2 at cam.ac.uk>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>
>                       <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>                <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>>>> wrote:
>
>                                     Dear Olympus developers,
>
>                                     I am trying to get the Olympus LetsGo!
>                system to
>                                  provide an N-best
>                                     list of speech recognition
>         hypotheses. I
>                found the
>                                  -n_best switch
>                                     which can be passed to the
>         PocketSphinxEngine
>                       which is
>                                  supposed to
>                                     enable this but when I set the
>         switch to
>                       anything other
>                                  than 0 the
>                                     system crashes immediately on any
>         audio
>                input. I
>                                  remember you said
>                                     that the system had been build to
>         provide
>                N-best
>                       lists
>                                  so I was
>                                     wondering if you could give any
>         advice on
>                why it
>                       is not
>                                  working.
>                                     Do you have a working N-best list
>         system that
>                       you could
>                                  send me to
>                                     see how things are configured?
>
>                                     In trying to solve the problem I
>         took a
>                look at the
>                                     PocketSphinxEngine source code and
>         have
>                noticed some
>                                  possible
>                                     memory access bugs which may be
>                contributing to
>                       this.
>                                  These were
>                                     related to the way the iHypsGenerated
>                variable was
>                                  used. I've
>                                     fixed these and can send them if you
>                would like (I
>                                  tried attaching
>                                     them but the mailing list won't
>         let me). The
>                       resulting
>                                  code still
>                                     crashes but at a later stage.
>         After the
>                fix, the
>                       log file
>                                     generates a WARNING: "ngram_search.c",
>                line 1000:. I
>                                  don't know if
>                                     this might be the cause of the
>         problem.
>                There is
>                       also a
>                                     possibility that I simply have to
>         add a
>                       configuration
>                                  variable to
>                                     PocketSphinx itself. At the moment
>         I have
>                only
>                       used the
>                                  n_best
>                                     switch on PocketSphinxEngine.
>
>                                     Please do let me know if you have any
>                ideas of
>                       how to
>                                  get this
>                                     working or who else to contact.
>
>                                     Thanks for all you help,
>
>                                     Blaise
>
>
>
>
>
>
>
>
>
>
>
>