[Olympus developers 224]: Re: N-best lists for PocketSphinx / Olympus

Thomas Harris tkharris at gmail.com
Tue Apr 13 13:31:09 EDT 2010


I have seen in the frame some statistics like #words>p where p is some kind
of confidence metric. It would seem that those statistics can only be
derived from word-level confidence scores in the recognizer. But that's just
my recollection right now. Someone should investigate, I guess.

Thanks,
-Thomas

On Tue, Apr 13, 2010 at 1:20 PM, Antoine Raux <antoine.raux at gmail.com>wrote:

> Hmmm... I don't know about word-level acoustic scores. I don't remember
> ever using these. Are these actually stored in the frame that
> PocketsphinxEngine and/or AudioServer send?
>
> antoine
>
> Thomas Harris wrote:
>
>> Good idea. That sounds doable. But we're also not getting acoustic model
>> scores. I don't know for sure, do latter stages of Olympus (like Helios)
>> depend on word-level acoustic scores?
>>
>> -Thomas
>>
>> On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux <antoine.raux at gmail.com<mailto:
>> antoine.raux at gmail.com>> wrote:
>>
>>    Actually, rather than a bunch of ifs as I wrote, the (still
>>    temporary) solution is to get the ngram_model_t object from ps,
>>    and then use the sphinxbase functions (such as ngram_tg_score) to
>>    compute the backoff type (which is exactly what ps does at
>>    decoding time).
>>
>>    antoine
>>
>>    Thomas Harris wrote:
>>
>>        Hi Antoine,
>>
>>        Yes, that was/is a problem and I tried something like this.
>>        But even more fundamental is the problem is that the p_seg_t*
>>        segment iterator that you get from pocketsphinx doesn't
>>        correctly implement ps_seg_prob when the segment iterator
>>        comes from the hypothesis iterator even though it works fine
>>        if you get the segment iterator from the best_hyp function (or
>>        whatever that's called). I've sent David the code segment that
>>        illustrates this bug. I don't know that there's any kind of
>>        work around. For the most part we've gotten mutiple hypotheses
>>        by running multiple recognizers, I guess.
>>
>>        Thanks,
>>        -Thomas
>>
>>        On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux
>>        <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
>>        <mailto:antoine.raux at gmail.com
>>        <mailto:antoine.raux at gmail.com>>> wrote:
>>
>>           Hi all,
>>
>>           What exactly is the confidence computation problem? Is it
>>        that we
>>           cannot compute the LM backoff type-based word confidence (see
>>           hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>>           If that is the problem, one way to fix this might be to modify
>>           hyp_conf_slm to accept a ps_seg_t as an argument (instead of
>>           always getting seg_iter from ps_seg_iter):
>>
>>           float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t
>>           *seg_iter = NULL)
>>           {
>>             const int MAX_TYPE_SIZE = 4096;
>>             int32 score, type[MAX_TYPE_SIZE];
>>             int32 k = 0;
>>
>>             // (antoine) no seg_iter was given, get the top segment
>>        iterator
>>           from ps
>>             if (seg_iter == NULL)
>>                 seg_iter = ps_seg_iter(psd, &score);
>>
>>                 type[k++] = 3;                      // use the
>>        trigram dummy
>>           for first word
>>
>>             if (seg_iter != NULL) {
>>                 while (seg_iter = ps_seg_next(seg_iter)) {
>>                     if (k == MAX_TYPE_SIZE) return NULL;
>>
>>                     int32 lscr, ascr;
>>                     ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>>                 }
>>             }
>>             type[k++] = 3; // (tk) dummy trigram after utterance
>>             type[k++] = 3; // (tk) sometimes there's no end token, in
>>        which case
>>                                // the list one was for the end token and
>>           this one is the dummy
>>
>>             // (antoine) allocate the array of confidence scores
>>             float* conf = (float*)malloc(k*sizeof(float));
>>
>>             for (int32 i = 1; i < k-2; i++) {
>>                 if(!useFixedScore) {
>>                     int32 t = type[i-1] + type[i] + ((type[i+1] +
>>           type[i+2])<<1); // (tk) wtf?
>>                     conf[i-1] = (float)((double)(t-6)/12.0);
>>                 } else {
>>                     conf[i-1] = 0.7f;
>>                 }
>>             }
>>
>>             return conf;
>>           }
>>
>>           Then further down, you can modify the third version of
>>           fillPartialHypStruct by just adding the argument when it calls
>>           hyp_conf_slm:
>>
>>           // [2008-02-19] (antoine): this function takes a partial
>>           hypothesis and a reference to a
>>           //                        THypStruct and fills in the hyp
>>        struct
>>           void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*
>>           phs, int fromNBest) {
>>
>>             Log(STD_STREAM, "Filling partial hyp struct\n");
>>
>>             size_t h_len, ch_len;
>>             int n_words = 0, n_validwords, has_oov;
>>             char tmp[16384];
>>             float *lm_conf = NULL;
>>
>>             // Fill in confidence values for words in result and build
>>           filtered hypothesis
>>             if (slm)
>>                 lm_conf = hyp_conf_slm(curr_seg_iter);
>>             else
>>                 lm_conf = hyp_conf_slm(curr_seg_iter, true);
>>
>>           (...)
>>
>>           I don't really have any setup to test this but if someone
>>        who has
>>           could give it a shot and post the result to the mailing list...
>>           Now it might be that I misunderstood what the problem was
>>           altogether (in which case I apologize for the spam)...
>>
>>           On a side note, the big commented out block in
>>        getHypStructs (as
>>           sent by Blaise) is from my Cactus code (which I had sent to
>>        Blaise
>>           as an example), so it's irrelevant to Olympus and should be
>>           deleted (for clarity's sake).
>>
>>           antoine
>>
>>           Blaise Thomson wrote:
>>
>>               Hi Thomas / Alan,
>>
>>               I've now got some preliminary N-best list code to work with
>>               PocketSphinx. With the help of  some example code from
>>        Antoine
>>               I've modified the pocketsphinx engine to produce a
>>        1-best list
>>               for partial recognition results but an N-best list upon
>>               completion. I've also modified the AudioServer to be
>>        able to
>>               receive multiple N-best lists from each of the
>>        recognizer (the
>>               number for each decoder specified by an optional ":N" after
>>               the decoder definition in the config file). In case
>>        this may
>>               be something you want to include in future versions of
>>        Olympus
>>               I've attached my modified files.
>>
>>               Note, however, that the code still doesn't produce any
>>               confidence score information for the N-best list. For this
>>               reason we will still probably be unable to use Olympus
>>        for our
>>               version of the LetsGo! system. If the PocketSphinx bugs you
>>               mentioned are fixed any time soon or if anyone finds
>>        out how
>>               to get confidence scores with the N-best list would you
>>        please
>>               let us know?
>>
>>               Many thanks,
>>               Blaise
>>
>>
>>
>>               Thomas Harris wrote:
>>
>>                   Hi Blaise,
>>
>>                   Thanks for looking into this. I hope we can include
>>        your
>>                   bugfixes. I've been looking into this as well, and
>>        there's
>>                   a more fundamental issue. It seems like you can't
>>        get word
>>                   confidence metrics from the PocketSphinx segment
>>        iterators
>>                   when you've gotten the sement iterators from the n_best
>>                   hypothisis iterator. It smells like a PocketSphinx bug,
>>                   but I haven't seen any reference implementation of
>>                   PocketSphinx that makes use of those confidence
>>        metrics in
>>                   an n_best setting, so I'm not sure that it isn't a
>>        problem
>>                   with how the PocketSphinx api is used. Until that
>>        issue is
>>                   resolved n_best lists won't work in Olympus, too many
>>                   downhill processes depend on those confidence metrics.
>>
>>                   Thanks,
>>                   -Thomas
>>
>>                   On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>>                   <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>>        <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
>>                   <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>>        <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>> wrote:
>>
>>                      Dear Olympus developers,
>>
>>                      I am trying to get the Olympus LetsGo! system to
>>                   provide an N-best
>>                      list of speech recognition hypotheses. I found the
>>                   -n_best switch
>>                      which can be passed to the PocketSphinxEngine
>>        which is
>>                   supposed to
>>                      enable this but when I set the switch to
>>        anything other
>>                   than 0 the
>>                      system crashes immediately on any audio input. I
>>                   remember you said
>>                      that the system had been build to provide N-best
>>        lists
>>                   so I was
>>                      wondering if you could give any advice on why it
>>        is not
>>                   working.
>>                      Do you have a working N-best list system that
>>        you could
>>                   send me to
>>                      see how things are configured?
>>
>>                      In trying to solve the problem I took a look at the
>>                      PocketSphinxEngine source code and have noticed some
>>                   possible
>>                      memory access bugs which may be contributing to
>>        this.
>>                   These were
>>                      related to the way the iHypsGenerated variable was
>>                   used. I've
>>                      fixed these and can send them if you would like (I
>>                   tried attaching
>>                      them but the mailing list won't let me). The
>>        resulting
>>                   code still
>>                      crashes but at a later stage. After the fix, the
>>        log file
>>                      generates a WARNING: "ngram_search.c", line 1000:. I
>>                   don't know if
>>                      this might be the cause of the problem. There is
>>        also a
>>                      possibility that I simply have to add a
>>        configuration
>>                   variable to
>>                      PocketSphinx itself. At the moment I have only
>>        used the
>>                   n_best
>>                      switch on PocketSphinxEngine.
>>
>>                      Please do let me know if you have any ideas of
>>        how to
>>                   get this
>>                      working or who else to contact.
>>
>>                      Thanks for all you help,
>>
>>                      Blaise
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100413/103faedb/attachment-0001.html


More information about the Olympus-developers mailing list