[Olympus developers 227]: Re: N-best lists for PocketSphinx / Olympus
Antoine Raux
antoine.raux at gmail.com
Tue Apr 13 19:42:04 EDT 2010
Point taken. I posted a message about this on the Sphinx Help forum...
antoine
Thomas Harris wrote:
> These fixes, as described, could be applied directly to the
> PocketSphinx API, and could be a benefit to the larger community of
> PocketSphinx users, so I think that the issue of motivation could be
> raised there as well. This is especially the case since it's unclear
> whether the current PocketSphinx API accurately reflects the actually
> functionality of the system.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 4:41 PM, Antoine Raux <antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>> wrote:
>
> As far as I can tell by looking at the code, PocketsphinxEngine
> does not send any word-level confidence information to
> AudioServer. That's good news. The bad news is that it computes
> utterance-level AM and LM scores by summing those of the words (so
> it needs word-level scores for that...). This seems the only way
> to get separate AM and LM score from Pocketsphinx (someone,
> correct me if I'm wrong). You can get the overall score though
> (the sum of the the AM and LM scores), using ps_nbest_hyp. So we
> could recompute the LM score of the i-th hypothesis using the
> Sphinxbase's ngram_tg_score routine (at the same time as getting
> the LM backoff type) and subtract the LM score from the total
> score to get the AM score (after potentially weighting by the LM
> weight, although the LM score might already include that, I'm not
> sure)...
>
> All in all, it's not that hard but I'm not even sure the overall
> AM score is used anywhere (I think it can be used when computing
> confidence with Helios but that depends on your model. E.g., the
> LetsGoPublic model does not use that feature).
> The only question is: is there someone motivated enough to do this? ;)
>
> antoine
>
>
> Thomas Harris wrote:
>
> I have seen in the frame some statistics like #words>p where p
> is some kind of confidence metric. It would seem that those
> statistics can only be derived from word-level confidence
> scores in the recognizer. But that's just my recollection
> right now. Someone should investigate, I guess.
>
> Thanks,
> -Thomas
>
>
> On Tue, Apr 13, 2010 at 1:20 PM, Antoine Raux
> <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>> wrote:
>
> Hmmm... I don't know about word-level acoustic scores. I don't
> remember ever using these. Are these actually stored in the
> frame
> that PocketsphinxEngine and/or AudioServer send?
>
> antoine
>
> Thomas Harris wrote:
>
> Good idea. That sounds doable. But we're also not getting
> acoustic model scores. I don't know for sure, do latter
> stages
> of Olympus (like Helios) depend on word-level acoustic
> scores?
>
> -Thomas
>
> On Tue, Apr 13, 2010 at 1:12 PM, Antoine Raux
> <antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>
> <mailto:antoine.raux at gmail.com <mailto:antoine.raux at gmail.com>>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>>> wrote:
>
> Actually, rather than a bunch of ifs as I wrote, the
> (still
> temporary) solution is to get the ngram_model_t
> object from ps,
> and then use the sphinxbase functions (such as
> ngram_tg_score) to
> compute the backoff type (which is exactly what ps
> does at
> decoding time).
>
> antoine
>
> Thomas Harris wrote:
>
> Hi Antoine,
>
> Yes, that was/is a problem and I tried something
> like this.
> But even more fundamental is the problem is that the
> p_seg_t*
> segment iterator that you get from pocketsphinx
> doesn't
> correctly implement ps_seg_prob when the segment
> iterator
> comes from the hypothesis iterator even though
> it works
> fine
> if you get the segment iterator from the best_hyp
> function (or
> whatever that's called). I've sent David the code
> segment that
> illustrates this bug. I don't know that there's
> any kind of
> work around. For the most part we've gotten mutiple
> hypotheses
> by running multiple recognizers, I guess.
>
> Thanks,
> -Thomas
>
> On Tue, Apr 13, 2010 at 11:58 AM, Antoine Raux
> <antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>
> <mailto:antoine.raux at gmail.com
> <mailto:antoine.raux at gmail.com>>>>> wrote:
>
> Hi all,
>
> What exactly is the confidence computation
> problem?
> Is it
> that we
> cannot compute the LM backoff type-based word
> confidence (see
> hyp_conf_slm in PocketsphinxEngine's main.cpp)?
> If that is the problem, one way to fix this
> might be
> to modify
> hyp_conf_slm to accept a ps_seg_t as an argument
> (instead of
> always getting seg_iter from ps_seg_iter):
>
> float* hyp_conf_slm (bool useFixedScore = false,
> ps_seg_t
> *seg_iter = NULL)
> {
> const int MAX_TYPE_SIZE = 4096;
> int32 score, type[MAX_TYPE_SIZE];
> int32 k = 0;
>
> // (antoine) no seg_iter was given, get the top
> segment
> iterator
> from ps
> if (seg_iter == NULL)
> seg_iter = ps_seg_iter(psd, &score);
>
> type[k++] = 3; //
> use the
> trigram dummy
> for first word
>
> if (seg_iter != NULL) {
> while (seg_iter = ps_seg_next(seg_iter)) {
> if (k == MAX_TYPE_SIZE) return NULL;
>
> int32 lscr, ascr;
> ps_seg_prob(seg_iter, &ascr, &lscr,
> &type[k++]);
> }
> }
> type[k++] = 3; // (tk) dummy trigram after
> utterance
> type[k++] = 3; // (tk) sometimes there's no end
> token, in
> which case
> // the list one was for
> the end
> token and
> this one is the dummy
>
> // (antoine) allocate the array of
> confidence scores
> float* conf = (float*)malloc(k*sizeof(float));
>
> for (int32 i = 1; i < k-2; i++) {
> if(!useFixedScore) {
> int32 t = type[i-1] + type[i] +
> ((type[i+1] +
> type[i+2])<<1); // (tk) wtf?
> conf[i-1] =
> (float)((double)(t-6)/12.0);
> } else {
> conf[i-1] = 0.7f;
> }
> }
>
> return conf;
> }
>
> Then further down, you can modify the third
> version of
> fillPartialHypStruct by just adding the argument
> when it calls
> hyp_conf_slm:
>
> // [2008-02-19] (antoine): this function
> takes a partial
> hypothesis and a reference to a
> // THypStruct and fills in
> the hyp
> struct
> void fillPartialHypStruct(ps_seg_t*
> curr_seg_iter,
> THypStruct*
> phs, int fromNBest) {
>
> Log(STD_STREAM, "Filling partial hyp
> struct\n");
>
> size_t h_len, ch_len;
> int n_words = 0, n_validwords, has_oov;
> char tmp[16384];
> float *lm_conf = NULL;
>
> // Fill in confidence values for words in
> result
> and build
> filtered hypothesis
> if (slm)
> lm_conf = hyp_conf_slm(curr_seg_iter);
> else
> lm_conf = hyp_conf_slm(curr_seg_iter,
> true);
>
> (...)
>
> I don't really have any setup to test this but if
> someone
> who has
> could give it a shot and post the result to the
> mailing list...
> Now it might be that I misunderstood what the
> problem was
> altogether (in which case I apologize for the
> spam)...
>
> On a side note, the big commented out block in
> getHypStructs (as
> sent by Blaise) is from my Cactus code (which
> I had
> sent to
> Blaise
> as an example), so it's irrelevant to Olympus and
> should be
> deleted (for clarity's sake).
>
> antoine
>
> Blaise Thomson wrote:
>
> Hi Thomas / Alan,
>
> I've now got some preliminary N-best list
> code
> to work with
> PocketSphinx. With the help of some example
> code from
> Antoine
> I've modified the pocketsphinx engine to
> produce a
> 1-best list
> for partial recognition results but an N-best
> list upon
> completion. I've also modified the
> AudioServer to be
> able to
> receive multiple N-best lists from each
> of the
> recognizer (the
> number for each decoder specified by an
> optional
> ":N" after
> the decoder definition in the config
> file). In case
> this may
> be something you want to include in future
> versions of
> Olympus
> I've attached my modified files.
>
> Note, however, that the code still doesn't
> produce any
> confidence score information for the N-best
> list. For this
> reason we will still probably be unable
> to use
> Olympus
> for our
> version of the LetsGo! system. If the
> PocketSphinx bugs you
> mentioned are fixed any time soon or if
> anyone finds
> out how
> to get confidence scores with the N-best list
> would you
> please
> let us know?
>
> Many thanks,
> Blaise
>
>
>
> Thomas Harris wrote:
>
> Hi Blaise,
>
> Thanks for looking into this. I hope
> we can
> include
> your
> bugfixes. I've been looking into this as
> well, and
> there's
> a more fundamental issue. It seems
> like you
> can't
> get word
> confidence metrics from the
> PocketSphinx segment
> iterators
> when you've gotten the sement
> iterators from
> the n_best
> hypothisis iterator. It smells like a
> PocketSphinx bug,
> but I haven't seen any reference
> implementation of
> PocketSphinx that makes use of those
> confidence
> metrics in
> an n_best setting, so I'm not sure
> that it
> isn't a
> problem
> with how the PocketSphinx api is
> used. Until
> that
> issue is
> resolved n_best lists won't work in
> Olympus,
> too many
> downhill processes depend on those
> confidence metrics.
>
> Thanks,
> -Thomas
>
> On Wed, Mar 24, 2010 at 4:39 AM,
> Blaise Thomson
> <brmt2 at cam.ac.uk
> <mailto:brmt2 at cam.ac.uk> <mailto:brmt2 at cam.ac.uk
> <mailto:brmt2 at cam.ac.uk>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>>
> <mailto:brmt2 at cam.ac.uk
> <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>
> <mailto:brmt2 at cam.ac.uk <mailto:brmt2 at cam.ac.uk>>>>>> wrote:
>
> Dear Olympus developers,
>
> I am trying to get the Olympus LetsGo!
> system to
> provide an N-best
> list of speech recognition
> hypotheses. I
> found the
> -n_best switch
> which can be passed to the
> PocketSphinxEngine
> which is
> supposed to
> enable this but when I set the
> switch to
> anything other
> than 0 the
> system crashes immediately on any
> audio
> input. I
> remember you said
> that the system had been build to
> provide
> N-best
> lists
> so I was
> wondering if you could give any
> advice on
> why it
> is not
> working.
> Do you have a working N-best list
> system that
> you could
> send me to
> see how things are configured?
>
> In trying to solve the problem I
> took a
> look at the
> PocketSphinxEngine source code and
> have
> noticed some
> possible
> memory access bugs which may be
> contributing to
> this.
> These were
> related to the way the iHypsGenerated
> variable was
> used. I've
> fixed these and can send them if you
> would like (I
> tried attaching
> them but the mailing list won't
> let me). The
> resulting
> code still
> crashes but at a later stage.
> After the
> fix, the
> log file
> generates a WARNING: "ngram_search.c",
> line 1000:. I
> don't know if
> this might be the cause of the
> problem.
> There is
> also a
> possibility that I simply have to
> add a
> configuration
> variable to
> PocketSphinx itself. At the moment
> I have
> only
> used the
> n_best
> switch on PocketSphinxEngine.
>
> Please do let me know if you have any
> ideas of
> how to
> get this
> working or who else to contact.
>
> Thanks for all you help,
>
> Blaise
>
>
>
>
>
>
>
>
>
>
>
>
More information about the Olympus-developers
mailing list