[Olympus developers 220]: Re: N-best lists for PocketSphinx / Olympus

Tue Apr 13 12:53:43 EDT 2010

Okay I feared there might be such a deeper problem... It could be that 
ps does not keep track of all the necessary information for non-top 
hypotheses (for lighter weight book keeping)... A work around I kind 
think of (which is kind of heavyweight), would be for PocketsphinxEnging 
to recompute the backoff type from the word string instead of getting it 
from the recognizer (a bunch of ifs to test if the trigram exists in the 
LM, or the bigram, etc... should do the trick so it shouldn't be that 
hard...). If there's no short term solution in sight from ps, I would 
recommend doing it until ps gets fixed.

antoine

Thomas Harris wrote:
> Hi Antoine,
>
> Yes, that was/is a problem and I tried something like this. But even 
> more fundamental is the problem is that the p_seg_t* segment iterator 
> that you get from pocketsphinx doesn't correctly implement ps_seg_prob 
> when the segment iterator comes from the hypothesis iterator even 
> though it works fine if you get the segment iterator from the best_hyp 
> function (or whatever that's called). I've sent David the code segment 
> that illustrates this bug. I don't know that there's any kind of work 
> around. For the most part we've gotten mutiple hypotheses by running 
> multiple recognizers, I guess.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux <antoine.raux at gmail.com 
> <mailto:antoine.raux at gmail.com>> wrote:
>
>     Hi all,
>
>     What exactly is the confidence computation problem? Is it that we
>     cannot compute the LM backoff type-based word confidence (see
>     hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>     If that is the problem, one way to fix this might be to modify
>     hyp_conf_slm to accept a ps_seg_t as an argument (instead of
>     always getting seg_iter from ps_seg_iter):
>
>     float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t
>     *seg_iter = NULL)
>     {
>       const int MAX_TYPE_SIZE = 4096;
>       int32 score, type[MAX_TYPE_SIZE];
>       int32 k = 0;
>
>       // (antoine) no seg_iter was given, get the top segment iterator
>     from ps
>       if (seg_iter == NULL)
>           seg_iter = ps_seg_iter(psd, &score);
>
>           type[k++] = 3;                      // use the trigram dummy
>     for first word
>
>       if (seg_iter != NULL) {
>           while (seg_iter = ps_seg_next(seg_iter)) {
>               if (k == MAX_TYPE_SIZE) return NULL;
>
>               int32 lscr, ascr;
>               ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>           }
>       }
>       type[k++] = 3; // (tk) dummy trigram after utterance
>       type[k++] = 3; // (tk) sometimes there's no end token, in which case
>                          // the list one was for the end token and
>     this one is the dummy
>
>       // (antoine) allocate the array of confidence scores
>       float* conf = (float*)malloc(k*sizeof(float));
>
>       for (int32 i = 1; i < k-2; i++) {
>           if(!useFixedScore) {
>               int32 t = type[i-1] + type[i] + ((type[i+1] +
>     type[i+2])<<1); // (tk) wtf?
>               conf[i-1] = (float)((double)(t-6)/12.0);
>           } else {
>               conf[i-1] = 0.7f;
>           }
>       }
>
>       return conf;
>     }
>
>     Then further down, you can modify the third version of
>     fillPartialHypStruct by just adding the argument when it calls
>     hyp_conf_slm:
>
>     // [2008-02-19] (antoine): this function takes a partial
>     hypothesis and a reference to a
>     //                        THypStruct and fills in the hyp struct
>     void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*
>     phs, int fromNBest) {
>
>       Log(STD_STREAM, "Filling partial hyp struct\n");
>
>       size_t h_len, ch_len;
>       int n_words = 0, n_validwords, has_oov;
>       char tmp[16384];
>       float *lm_conf = NULL;
>
>       // Fill in confidence values for words in result and build
>     filtered hypothesis
>       if (slm)
>           lm_conf = hyp_conf_slm(curr_seg_iter);
>       else
>           lm_conf = hyp_conf_slm(curr_seg_iter, true);
>
>     (...)
>
>     I don't really have any setup to test this but if someone who has
>     could give it a shot and post the result to the mailing list...
>     Now it might be that I misunderstood what the problem was
>     altogether (in which case I apologize for the spam)...
>
>     On a side note, the big commented out block in getHypStructs (as
>     sent by Blaise) is from my Cactus code (which I had sent to Blaise
>     as an example), so it's irrelevant to Olympus and should be
>     deleted (for clarity's sake).
>
>     antoine
>
>     Blaise Thomson wrote:
>
>         Hi Thomas / Alan,
>
>         I've now got some preliminary N-best list code to work with
>         PocketSphinx. With the help of  some example code from Antoine
>         I've modified the pocketsphinx engine to produce a 1-best list
>         for partial recognition results but an N-best list upon
>         completion. I've also modified the AudioServer to be able to
>         receive multiple N-best lists from each of the recognizer (the
>         number for each decoder specified by an optional ":N" after
>         the decoder definition in the config file). In case this may
>         be something you want to include in future versions of Olympus
>         I've attached my modified files.
>
>         Note, however, that the code still doesn't produce any
>         confidence score information for the N-best list. For this
>         reason we will still probably be unable to use Olympus for our
>         version of the LetsGo! system. If the PocketSphinx bugs you
>         mentioned are fixed any time soon or if anyone finds out how
>         to get confidence scores with the N-best list would you please
>         let us know?
>
>         Many thanks,
>         Blaise
>
>
>
>         Thomas Harris wrote:
>
>             Hi Blaise,
>
>             Thanks for looking into this. I hope we can include your
>             bugfixes. I've been looking into this as well, and there's
>             a more fundamental issue. It seems like you can't get word
>             confidence metrics from the PocketSphinx segment iterators
>             when you've gotten the sement iterators from the n_best
>             hypothisis iterator. It smells like a PocketSphinx bug,
>             but I haven't seen any reference implementation of
>             PocketSphinx that makes use of those confidence metrics in
>             an n_best setting, so I'm not sure that it isn't a problem
>             with how the PocketSphinx api is used. Until that issue is
>             resolved n_best lists won't work in Olympus, too many
>             downhill processes depend on those confidence metrics.
>
>             Thanks,
>             -Thomas
>
>             On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>             <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>             <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>> wrote:
>
>                Dear Olympus developers,
>
>                I am trying to get the Olympus LetsGo! system to
>             provide an N-best
>                list of speech recognition hypotheses. I found the
>             -n_best switch
>                which can be passed to the PocketSphinxEngine which is
>             supposed to
>                enable this but when I set the switch to anything other
>             than 0 the
>                system crashes immediately on any audio input. I
>             remember you said
>                that the system had been build to provide N-best lists
>             so I was
>                wondering if you could give any advice on why it is not
>             working.
>                Do you have a working N-best list system that you could
>             send me to
>                see how things are configured?
>
>                In trying to solve the problem I took a look at the
>                PocketSphinxEngine source code and have noticed some
>             possible
>                memory access bugs which may be contributing to this.
>             These were
>                related to the way the iHypsGenerated variable was
>             used. I've
>                fixed these and can send them if you would like (I
>             tried attaching
>                them but the mailing list won't let me). The resulting
>             code still
>                crashes but at a later stage. After the fix, the log file
>                generates a WARNING: "ngram_search.c", line 1000:. I
>             don't know if
>                this might be the cause of the problem. There is also a
>                possibility that I simply have to add a configuration
>             variable to
>                PocketSphinx itself. At the moment I have only used the
>             n_best
>                switch on PocketSphinxEngine.
>
>                Please do let me know if you have any ideas of how to
>             get this
>                working or who else to contact.
>
>                Thanks for all you help,
>
>                Blaise
>
>
>
>
>
>