[Olympus developers 222]: Re: N-best lists for PocketSphinx / Olympus

Thomas Harris tkharris at gmail.com
Tue Apr 13 13:18:22 EDT 2010


Good idea. That sounds doable. But we're also not getting acoustic model
scores. I don't know for sure, do latter stages of Olympus (like Helios)
depend on word-level acoustic scores?

-Thomas

On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux <antoine.raux at gmail.com>wrote:

> Actually, rather than a bunch of ifs as I wrote, the (still temporary)
> solution is to get the ngram_model_t object from ps, and then use the
> sphinxbase functions (such as ngram_tg_score) to compute the backoff type
> (which is exactly what ps does at decoding time).
>
> antoine
>
> Thomas Harris wrote:
>
>> Hi Antoine,
>>
>> Yes, that was/is a problem and I tried something like this. But even more
>> fundamental is the problem is that the p_seg_t* segment iterator that you
>> get from pocketsphinx doesn't correctly implement ps_seg_prob when the
>> segment iterator comes from the hypothesis iterator even though it works
>> fine if you get the segment iterator from the best_hyp function (or whatever
>> that's called). I've sent David the code segment that illustrates this bug.
>> I don't know that there's any kind of work around. For the most part we've
>> gotten mutiple hypotheses by running multiple recognizers, I guess.
>>
>> Thanks,
>> -Thomas
>>
>> On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux <antoine.raux at gmail.com<mailto:
>> antoine.raux at gmail.com>> wrote:
>>
>>    Hi all,
>>
>>    What exactly is the confidence computation problem? Is it that we
>>    cannot compute the LM backoff type-based word confidence (see
>>    hyp_conf_slm in PocketsphinxEngine's main.cpp)?
>>    If that is the problem, one way to fix this might be to modify
>>    hyp_conf_slm to accept a ps_seg_t as an argument (instead of
>>    always getting seg_iter from ps_seg_iter):
>>
>>    float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t
>>    *seg_iter = NULL)
>>    {
>>      const int MAX_TYPE_SIZE = 4096;
>>      int32 score, type[MAX_TYPE_SIZE];
>>      int32 k = 0;
>>
>>      // (antoine) no seg_iter was given, get the top segment iterator
>>    from ps
>>      if (seg_iter == NULL)
>>          seg_iter = ps_seg_iter(psd, &score);
>>
>>          type[k++] = 3;                      // use the trigram dummy
>>    for first word
>>
>>      if (seg_iter != NULL) {
>>          while (seg_iter = ps_seg_next(seg_iter)) {
>>              if (k == MAX_TYPE_SIZE) return NULL;
>>
>>              int32 lscr, ascr;
>>              ps_seg_prob(seg_iter, &ascr, &lscr, &type[k++]);
>>          }
>>      }
>>      type[k++] = 3; // (tk) dummy trigram after utterance
>>      type[k++] = 3; // (tk) sometimes there's no end token, in which case
>>                         // the list one was for the end token and
>>    this one is the dummy
>>
>>      // (antoine) allocate the array of confidence scores
>>      float* conf = (float*)malloc(k*sizeof(float));
>>
>>      for (int32 i = 1; i < k-2; i++) {
>>          if(!useFixedScore) {
>>              int32 t = type[i-1] + type[i] + ((type[i+1] +
>>    type[i+2])<<1); // (tk) wtf?
>>              conf[i-1] = (float)((double)(t-6)/12.0);
>>          } else {
>>              conf[i-1] = 0.7f;
>>          }
>>      }
>>
>>      return conf;
>>    }
>>
>>    Then further down, you can modify the third version of
>>    fillPartialHypStruct by just adding the argument when it calls
>>    hyp_conf_slm:
>>
>>    // [2008-02-19] (antoine): this function takes a partial
>>    hypothesis and a reference to a
>>    //                        THypStruct and fills in the hyp struct
>>    void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*
>>    phs, int fromNBest) {
>>
>>      Log(STD_STREAM, "Filling partial hyp struct\n");
>>
>>      size_t h_len, ch_len;
>>      int n_words = 0, n_validwords, has_oov;
>>      char tmp[16384];
>>      float *lm_conf = NULL;
>>
>>      // Fill in confidence values for words in result and build
>>    filtered hypothesis
>>      if (slm)
>>          lm_conf = hyp_conf_slm(curr_seg_iter);
>>      else
>>          lm_conf = hyp_conf_slm(curr_seg_iter, true);
>>
>>    (...)
>>
>>    I don't really have any setup to test this but if someone who has
>>    could give it a shot and post the result to the mailing list...
>>    Now it might be that I misunderstood what the problem was
>>    altogether (in which case I apologize for the spam)...
>>
>>    On a side note, the big commented out block in getHypStructs (as
>>    sent by Blaise) is from my Cactus code (which I had sent to Blaise
>>    as an example), so it's irrelevant to Olympus and should be
>>    deleted (for clarity's sake).
>>
>>    antoine
>>
>>    Blaise Thomson wrote:
>>
>>        Hi Thomas / Alan,
>>
>>        I've now got some preliminary N-best list code to work with
>>        PocketSphinx. With the help of  some example code from Antoine
>>        I've modified the pocketsphinx engine to produce a 1-best list
>>        for partial recognition results but an N-best list upon
>>        completion. I've also modified the AudioServer to be able to
>>        receive multiple N-best lists from each of the recognizer (the
>>        number for each decoder specified by an optional ":N" after
>>        the decoder definition in the config file). In case this may
>>        be something you want to include in future versions of Olympus
>>        I've attached my modified files.
>>
>>        Note, however, that the code still doesn't produce any
>>        confidence score information for the N-best list. For this
>>        reason we will still probably be unable to use Olympus for our
>>        version of the LetsGo! system. If the PocketSphinx bugs you
>>        mentioned are fixed any time soon or if anyone finds out how
>>        to get confidence scores with the N-best list would you please
>>        let us know?
>>
>>        Many thanks,
>>        Blaise
>>
>>
>>
>>        Thomas Harris wrote:
>>
>>            Hi Blaise,
>>
>>            Thanks for looking into this. I hope we can include your
>>            bugfixes. I've been looking into this as well, and there's
>>            a more fundamental issue. It seems like you can't get word
>>            confidence metrics from the PocketSphinx segment iterators
>>            when you've gotten the sement iterators from the n_best
>>            hypothisis iterator. It smells like a PocketSphinx bug,
>>            but I haven't seen any reference implementation of
>>            PocketSphinx that makes use of those confidence metrics in
>>            an n_best setting, so I'm not sure that it isn't a problem
>>            with how the PocketSphinx api is used. Until that issue is
>>            resolved n_best lists won't work in Olympus, too many
>>            downhill processes depend on those confidence metrics.
>>
>>            Thanks,
>>            -Thomas
>>
>>            On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson
>>            <brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
>>            <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>> wrote:
>>
>>               Dear Olympus developers,
>>
>>               I am trying to get the Olympus LetsGo! system to
>>            provide an N-best
>>               list of speech recognition hypotheses. I found the
>>            -n_best switch
>>               which can be passed to the PocketSphinxEngine which is
>>            supposed to
>>               enable this but when I set the switch to anything other
>>            than 0 the
>>               system crashes immediately on any audio input. I
>>            remember you said
>>               that the system had been build to provide N-best lists
>>            so I was
>>               wondering if you could give any advice on why it is not
>>            working.
>>               Do you have a working N-best list system that you could
>>            send me to
>>               see how things are configured?
>>
>>               In trying to solve the problem I took a look at the
>>               PocketSphinxEngine source code and have noticed some
>>            possible
>>               memory access bugs which may be contributing to this.
>>            These were
>>               related to the way the iHypsGenerated variable was
>>            used. I've
>>               fixed these and can send them if you would like (I
>>            tried attaching
>>               them but the mailing list won't let me). The resulting
>>            code still
>>               crashes but at a later stage. After the fix, the log file
>>               generates a WARNING: "ngram_search.c", line 1000:. I
>>            don't know if
>>               this might be the cause of the problem. There is also a
>>               possibility that I simply have to add a configuration
>>            variable to
>>               PocketSphinx itself. At the moment I have only used the
>>            n_best
>>               switch on PocketSphinxEngine.
>>
>>               Please do let me know if you have any ideas of how to
>>            get this
>>               working or who else to contact.
>>
>>               Thanks for all you help,
>>
>>               Blaise
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/olympus-developers/attachments/20100413/27fdf769/attachment-0001.html


More information about the Olympus-developers mailing list