[Olympus developers 223]: Re: N-best lists for PocketSphinx / Olympus

Tue Apr 13 13:20:12 EDT 2010

Hmmm... I don't know about word-level acoustic scores. I don't remember 
ever using these. Are these actually stored in the frame that 
PocketsphinxEngine and/or AudioServer send?

antoine

Thomas Harris wrote:
> Good idea. That sounds doable. But we're also not getting acoustic 
> model scores. I don't know for sure, do latter stages of Olympus (like 
> Helios) depend on word-level acoustic scores?
>
> -Thomas
>
> On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux <antoine.raux at gmail.com 
> <mailto:antoine.raux at gmail.com>> wrote:
>
>     Actually, rather than a bunch of ifs as I wrote, the (still
>     temporary) solution is to get the ngram_model_t object from ps,
>     and then use the sphinxbase functions (such as ngram_tg_score) to
>     compute the backoff type (which is exactly what ps does at
>     decoding time).
>
>     antoine
>
>     Thomas Harris wrote:
>
>         Hi Antoine,
>
>         Yes, that was/is a problem and I tried something like this.
>         But even more fundamental is the problem is that the p_seg_t*
>         segment iterator that you get from pocketsphinx doesn't
>         correctly implement ps_seg_prob when the segment iterator
>         comes from the hypothesis iterator even though it works fine
>         if you get the segment iterator from the best_hyp function (or
>         whatever that's called). I've sent David the code segment that
>         illustrates this bug. I don't know that there's any kind of
>         work around. For the most part we've gotten mutiple hypotheses
>         by running multiple recognizers, I guess.
>
>         Thanks,
>         -Thomas
>
>         On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux
>         <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>         <mailto:antoine.raux at gmail.com
>         <mailto:antoine.raux at gmail.com>>> wrote:
>
>            Hi all,
>
>            What exactly is the confidence computation problem? Is it
>         that we
>            cannot compute the LM backoff type-based word confidence (see
>            hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>            If that is the problem, one way to fix this might be to modify
>            hyp_conf_slm to accept a ps_seg_t as an argument (instead of
>            always getting seg_iter from ps_seg_iter):
>
>            float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t
>            *seg_iter = NULL)
>            {
>              const int MAX_TYPE_SIZE = 4096;
>              int32 score, type[MAX_TYPE_SIZE];
>              int32 k = 0;
>
>              // (antoine) no seg_iter was given, get the top segment
>         iterator
>            from ps
>              if (seg_iter == NULL)
>                  seg_iter = ps_seg_iter(psd, &score);
>
>                  type[k++] = 3;                      // use the
>         trigram dummy
>            for first word
>
>              if (seg_iter != NULL) {
>                  while (seg_iter = ps_seg_next(seg_iter)) {
>                      if (k == MAX_TYPE_SIZE) return NULL;
>
>                      int32 lscr, ascr;
>                      ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>                  }
>              }
>              type[k++] = 3; // (tk) dummy trigram after utterance
>              type[k++] = 3; // (tk) sometimes there's no end token, in
>         which case
>                                 // the list one was for the end token and
>            this one is the dummy
>
>              // (antoine) allocate the array of confidence scores
>              float* conf = (float*)malloc(k*sizeof(float));
>
>              for (int32 i = 1; i < k-2; i++) {
>                  if(!useFixedScore) {
>                      int32 t = type[i-1] + type[i] + ((type[i+1] +
>            type[i+2])<<1); // (tk) wtf?
>                      conf[i-1] = (float)((double)(t-6)/12.0);
>                  } else {
>                      conf[i-1] = 0.7f;
>                  }
>              }
>
>              return conf;
>            }
>
>            Then further down, you can modify the third version of
>            fillPartialHypStruct by just adding the argument when it calls
>            hyp_conf_slm:
>
>            // [2008-02-19] (antoine): this function takes a partial
>            hypothesis and a reference to a
>            //                        THypStruct and fills in the hyp
>         struct
>            void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*
>            phs, int fromNBest) {
>
>              Log(STD_STREAM, "Filling partial hyp struct\n");
>
>              size_t h_len, ch_len;
>              int n_words = 0, n_validwords, has_oov;
>              char tmp[16384];
>              float *lm_conf = NULL;
>
>              // Fill in confidence values for words in result and build
>            filtered hypothesis
>              if (slm)
>                  lm_conf = hyp_conf_slm(curr_seg_iter);
>              else
>                  lm_conf = hyp_conf_slm(curr_seg_iter, true);
>
>            (...)
>
>            I don't really have any setup to test this but if someone
>         who has
>            could give it a shot and post the result to the mailing list...
>            Now it might be that I misunderstood what the problem was
>            altogether (in which case I apologize for the spam)...
>
>            On a side note, the big commented out block in
>         getHypStructs (as
>            sent by Blaise) is from my Cactus code (which I had sent to
>         Blaise
>            as an example), so it's irrelevant to Olympus and should be
>            deleted (for clarity's sake).
>
>            antoine
>
>            Blaise Thomson wrote:
>
>                Hi Thomas / Alan,
>
>                I've now got some preliminary N-best list code to work with
>                PocketSphinx. With the help of  some example code from
>         Antoine
>                I've modified the pocketsphinx engine to produce a
>         1-best list
>                for partial recognition results but an N-best list upon
>                completion. I've also modified the AudioServer to be
>         able to
>                receive multiple N-best lists from each of the
>         recognizer (the
>                number for each decoder specified by an optional ":N" after
>                the decoder definition in the config file). In case
>         this may
>                be something you want to include in future versions of
>         Olympus
>                I've attached my modified files.
>
>                Note, however, that the code still doesn't produce any
>                confidence score information for the N-best list. For this
>                reason we will still probably be unable to use Olympus
>         for our
>                version of the LetsGo! system. If the PocketSphinx bugs you
>                mentioned are fixed any time soon or if anyone finds
>         out how
>                to get confidence scores with the N-best list would you
>         please
>                let us know?
>
>                Many thanks,
>                Blaise
>
>
>
>                Thomas Harris wrote:
>
>                    Hi Blaise,
>
>                    Thanks for looking into this. I hope we can include
>         your
>                    bugfixes. I've been looking into this as well, and
>         there's
>                    a more fundamental issue. It seems like you can't
>         get word
>                    confidence metrics from the PocketSphinx segment
>         iterators
>                    when you've gotten the sement iterators from the n_best
>                    hypothisis iterator. It smells like a PocketSphinx bug,
>                    but I haven't seen any reference implementation of
>                    PocketSphinx that makes use of those confidence
>         metrics in
>                    an n_best setting, so I'm not sure that it isn't a
>         problem
>                    with how the PocketSphinx api is used. Until that
>         issue is
>                    resolved n_best lists won't work in Olympus, too many
>                    downhill processes depend on those confidence metrics.
>
>                    Thanks,
>                    -Thomas
>
>                    On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>                    <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>                    <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>         <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>> wrote:
>
>                       Dear Olympus developers,
>
>                       I am trying to get the Olympus LetsGo! system to
>                    provide an N-best
>                       list of speech recognition hypotheses. I found the
>                    -n_best switch
>                       which can be passed to the PocketSphinxEngine
>         which is
>                    supposed to
>                       enable this but when I set the switch to
>         anything other
>                    than 0 the
>                       system crashes immediately on any audio input. I
>                    remember you said
>                       that the system had been build to provide N-best
>         lists
>                    so I was
>                       wondering if you could give any advice on why it
>         is not
>                    working.
>                       Do you have a working N-best list system that
>         you could
>                    send me to
>                       see how things are configured?
>
>                       In trying to solve the problem I took a look at the
>                       PocketSphinxEngine source code and have noticed some
>                    possible
>                       memory access bugs which may be contributing to
>         this.
>                    These were
>                       related to the way the iHypsGenerated variable was
>                    used. I've
>                       fixed these and can send them if you would like (I
>                    tried attaching
>                       them but the mailing list won't let me). The
>         resulting
>                    code still
>                       crashes but at a later stage. After the fix, the
>         log file
>                       generates a WARNING: "ngram_search.c", line 1000:. I
>                    don't know if
>                       this might be the cause of the problem. There is
>         also a
>                       possibility that I simply have to add a
>         configuration
>                    variable to
>                       PocketSphinx itself. At the moment I have only
>         used the
>                    n_best
>                       switch on PocketSphinxEngine.
>
>                       Please do let me know if you have any ideas of
>         how to
>                    get this
>                       working or who else to contact.
>
>                       Thanks for all you help,
>
>                       Blaise
>
>
>
>
>
>
>
>