I have seen in the frame some statistics like #words&gt;p where p is some kind of confidence metric. It would seem that those statistics can only be derived from word-level confidence scores in the recognizer. But that&#39;s just my recollection right now. Someone should investigate, I guess.<br>

<br>Thanks,<br>-Thomas<br><br><div class="gmail_quote">On Tue, Apr 13, 2010 at 1:20 PM, Antoine Raux <span dir="ltr">&lt;<a href="mailto:antoine.raux@gmail.com">antoine.raux@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Hmmm... I don&#39;t know about word-level acoustic scores. I don&#39;t remember ever using these. Are these actually stored in the frame that PocketsphinxEngine and/or AudioServer send?<br>

<br>

antoine<br>

<br>

Thomas Harris wrote:<br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">

Good idea. That sounds doable. But we&#39;re also not getting acoustic model scores. I don&#39;t know for sure, do latter stages of Olympus (like Helios) depend on word-level acoustic scores?<br>

<br>

-Thomas<br>

<br></div><div><div></div><div class="h5">

On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux &lt;<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a> &lt;mailto:<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a>&gt;&gt; wrote:<br>


<br>

    Actually, rather than a bunch of ifs as I wrote, the (still<br>

    temporary) solution is to get the ngram_model_t object from ps,<br>

    and then use the sphinxbase functions (such as ngram_tg_score) to<br>

    compute the backoff type (which is exactly what ps does at<br>

    decoding time).<br>

<br>

    antoine<br>

<br>

    Thomas Harris wrote:<br>

<br>

        Hi Antoine,<br>

<br>

        Yes, that was/is a problem and I tried something like this.<br>

        But even more fundamental is the problem is that the p_seg_t*<br>

        segment iterator that you get from pocketsphinx doesn&#39;t<br>

        correctly implement ps_seg_prob when the segment iterator<br>

        comes from the hypothesis iterator even though it works fine<br>

        if you get the segment iterator from the best_hyp function (or<br>

        whatever that&#39;s called). I&#39;ve sent David the code segment that<br>

        illustrates this bug. I don&#39;t know that there&#39;s any kind of<br>

        work around. For the most part we&#39;ve gotten mutiple hypotheses<br>

        by running multiple recognizers, I guess.<br>

<br>

        Thanks,<br>

        -Thomas<br>

<br>

        On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux<br>

        &lt;<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a> &lt;mailto:<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a>&gt;<br>

        &lt;mailto:<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a><br>

        &lt;mailto:<a href="mailto:antoine.raux@gmail.com" target="_blank">antoine.raux@gmail.com</a>&gt;&gt;&gt; wrote:<br>

<br>

           Hi all,<br>

<br>

           What exactly is the confidence computation problem? Is it<br>

        that we<br>

           cannot compute the LM backoff type-based word confidence (see<br>

           hyp_conf_slm in PocketsphinxEngine&#39;s main.cpp)?<br>

           If that is the problem, one way to fix this might be to modify<br>

           hyp_conf_slm to accept a ps_seg_t as an argument (instead of<br>

           always getting seg_iter from ps_seg_iter):<br>

<br>

           float* hyp_conf_slm (bool useFixedScore = false, ps_seg_t<br>

           *seg_iter = NULL)<br>

           {<br>

             const int MAX_TYPE_SIZE = 4096;<br>

             int32 score, type[MAX_TYPE_SIZE];<br>

             int32 k = 0;<br>

<br>

             // (antoine) no seg_iter was given, get the top segment<br>

        iterator<br>

           from ps<br>

             if (seg_iter == NULL)<br>

                 seg_iter = ps_seg_iter(psd, &amp;score);<br>

<br>

                 type[k++] = 3;                      // use the<br>

        trigram dummy<br>

           for first word<br>

<br>

             if (seg_iter != NULL) {<br>

                 while (seg_iter = ps_seg_next(seg_iter)) {<br>

                     if (k == MAX_TYPE_SIZE) return NULL;<br>

<br>

                     int32 lscr, ascr;<br>

                     ps_seg_prob(seg_iter, &amp;ascr, &amp;lscr, &amp;type[k++]);<br>

                 }<br>

             }<br>

             type[k++] = 3; // (tk) dummy trigram after utterance<br>

             type[k++] = 3; // (tk) sometimes there&#39;s no end token, in<br>

        which case<br>

                                // the list one was for the end token and<br>

           this one is the dummy<br>

<br>

             // (antoine) allocate the array of confidence scores<br>

             float* conf = (float*)malloc(k*sizeof(float));<br>

<br>

             for (int32 i = 1; i &lt; k-2; i++) {<br>

                 if(!useFixedScore) {<br>

                     int32 t = type[i-1] + type[i] + ((type[i+1] +<br>

           type[i+2])&lt;&lt;1); // (tk) wtf?<br>

                     conf[i-1] = (float)((double)(t-6)/12.0);<br>

                 } else {<br>

                     conf[i-1] = 0.7f;<br>

                 }<br>

             }<br>

<br>

             return conf;<br>

           }<br>

<br>

           Then further down, you can modify the third version of<br>

           fillPartialHypStruct by just adding the argument when it calls<br>

           hyp_conf_slm:<br>

<br>

           // [2008-02-19] (antoine): this function takes a partial<br>

           hypothesis and a reference to a<br>

           //                        THypStruct and fills in the hyp<br>

        struct<br>

           void fillPartialHypStruct(ps_seg_t* curr_seg_iter, THypStruct*<br>

           phs, int fromNBest) {<br>

<br>

             Log(STD_STREAM, &quot;Filling partial hyp struct\n&quot;);<br>

<br>

             size_t h_len, ch_len;<br>

             int n_words = 0, n_validwords, has_oov;<br>

             char tmp[16384];<br>

             float *lm_conf = NULL;<br>

<br>

             // Fill in confidence values for words in result and build<br>

           filtered hypothesis<br>

             if (slm)<br>

                 lm_conf = hyp_conf_slm(curr_seg_iter);<br>

             else<br>

                 lm_conf = hyp_conf_slm(curr_seg_iter, true);<br>

<br>

           (...)<br>

<br>

           I don&#39;t really have any setup to test this but if someone<br>

        who has<br>

           could give it a shot and post the result to the mailing list...<br>

           Now it might be that I misunderstood what the problem was<br>

           altogether (in which case I apologize for the spam)...<br>

<br>

           On a side note, the big commented out block in<br>

        getHypStructs (as<br>

           sent by Blaise) is from my Cactus code (which I had sent to<br>

        Blaise<br>

           as an example), so it&#39;s irrelevant to Olympus and should be<br>

           deleted (for clarity&#39;s sake).<br>

<br>

           antoine<br>

<br>

           Blaise Thomson wrote:<br>

<br>

               Hi Thomas / Alan,<br>

<br>

               I&#39;ve now got some preliminary N-best list code to work with<br>

               PocketSphinx. With the help of  some example code from<br>

        Antoine<br>

               I&#39;ve modified the pocketsphinx engine to produce a<br>

        1-best list<br>

               for partial recognition results but an N-best list upon<br>

               completion. I&#39;ve also modified the AudioServer to be<br>

        able to<br>

               receive multiple N-best lists from each of the<br>

        recognizer (the<br>

               number for each decoder specified by an optional &quot;:N&quot; after<br>

               the decoder definition in the config file). In case<br>

        this may<br>

               be something you want to include in future versions of<br>

        Olympus<br>

               I&#39;ve attached my modified files.<br>

<br>

               Note, however, that the code still doesn&#39;t produce any<br>

               confidence score information for the N-best list. For this<br>

               reason we will still probably be unable to use Olympus<br>

        for our<br>

               version of the LetsGo! system. If the PocketSphinx bugs you<br>

               mentioned are fixed any time soon or if anyone finds<br>

        out how<br>

               to get confidence scores with the N-best list would you<br>

        please<br>

               let us know?<br>

<br>

               Many thanks,<br>

               Blaise<br>

<br>

<br>

<br>

               Thomas Harris wrote:<br>

<br>

                   Hi Blaise,<br>

<br>

                   Thanks for looking into this. I hope we can include<br>

        your<br>

                   bugfixes. I&#39;ve been looking into this as well, and<br>

        there&#39;s<br>

                   a more fundamental issue. It seems like you can&#39;t<br>

        get word<br>

                   confidence metrics from the PocketSphinx segment<br>

        iterators<br>

                   when you&#39;ve gotten the sement iterators from the n_best<br>

                   hypothisis iterator. It smells like a PocketSphinx bug,<br>

                   but I haven&#39;t seen any reference implementation of<br>

                   PocketSphinx that makes use of those confidence<br>

        metrics in<br>

                   an n_best setting, so I&#39;m not sure that it isn&#39;t a<br>

        problem<br>

                   with how the PocketSphinx api is used. Until that<br>

        issue is<br>

                   resolved n_best lists won&#39;t work in Olympus, too many<br>

                   downhill processes depend on those confidence metrics.<br>

<br>

                   Thanks,<br>

                   -Thomas<br>

<br>

                   On Wed, Mar 24, 2010 at 4:39 AM, Blaise Thomson<br>

                   &lt;<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a> &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a>&gt;<br>

        &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a> &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a>&gt;&gt;<br>

                   &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a> &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a>&gt;<br>

        &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a> &lt;mailto:<a href="mailto:brmt2@cam.ac.uk" target="_blank">brmt2@cam.ac.uk</a>&gt;&gt;&gt;&gt; wrote:<br>

<br>

                      Dear Olympus developers,<br>

<br>

                      I am trying to get the Olympus LetsGo! system to<br>

                   provide an N-best<br>

                      list of speech recognition hypotheses. I found the<br>

                   -n_best switch<br>

                      which can be passed to the PocketSphinxEngine<br>

        which is<br>

                   supposed to<br>

                      enable this but when I set the switch to<br>

        anything other<br>

                   than 0 the<br>

                      system crashes immediately on any audio input. I<br>

                   remember you said<br>

                      that the system had been build to provide N-best<br>

        lists<br>

                   so I was<br>

                      wondering if you could give any advice on why it<br>

        is not<br>

                   working.<br>

                      Do you have a working N-best list system that<br>

        you could<br>

                   send me to<br>

                      see how things are configured?<br>

<br>

                      In trying to solve the problem I took a look at the<br>

                      PocketSphinxEngine source code and have noticed some<br>

                   possible<br>

                      memory access bugs which may be contributing to<br>

        this.<br>

                   These were<br>

                      related to the way the iHypsGenerated variable was<br>

                   used. I&#39;ve<br>

                      fixed these and can send them if you would like (I<br>

                   tried attaching<br>

                      them but the mailing list won&#39;t let me). The<br>

        resulting<br>

                   code still<br>

                      crashes but at a later stage. After the fix, the<br>

        log file<br>

                      generates a WARNING: &quot;ngram_search.c&quot;, line 1000:. I<br>

                   don&#39;t know if<br>

                      this might be the cause of the problem. There is<br>

        also a<br>

                      possibility that I simply have to add a<br>

        configuration<br>

                   variable to<br>

                      PocketSphinx itself. At the moment I have only<br>

        used the<br>

                   n_best<br>

                      switch on PocketSphinxEngine.<br>

<br>

                      Please do let me know if you have any ideas of<br>

        how to<br>

                   get this<br>

                      working or who else to contact.<br>

<br>

                      Thanks for all you help,<br>

<br>

                      Blaise<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</div></div></blockquote>

<br>

</blockquote></div><br>