[RavenclawDev 195] Re: testing dialog system

Thu Nov 23 12:50:37 EST 2006

Thank you Antoine,

It seems like testing ASR in a batch is really essential. We are now just
starting to add ASR to our system and we will definitely do something
similar for the ASR testing.

We are also making TTY capable of running in a batch mode in order to test
parser, DM, backend, and NLG, so that when something changes we could
quickly verify that existing functionality is not broken.

Thanks,
Svetlana

On 11/20/06, Antoine Raux <antoine at cs.cmu.edu> wrote:
>
>  Hi Svetlana,
>
>
>
> It depends what you want to test. If it's the DM or the whole system, at
> this point we don't have a batch testing set up.
>
> What I do is first (during development/debugging phase) use the TTY
> interface to explore as many dialogue paths as possible (or specifically the
> one that crashed your system). Then, on major updates, we have a stress test
> on the telephone system: we get everyone from the Let's Go team to call as
> much as possible during one hour. That invariably uncovers more bugs… Then
> of course, we still have bugs after that, which we only catch by monitoring
> the live system (that's one advantage of Let's Go, we get enough calls per
> day that we are usually able to catch a fatal bug in a couple of days)… As
> you see, it's not very automated… There have been talks of implementing some
> kind of regression testing (i.e. batch from previous hub logs or something
> like that) but at this point it's still at the stage of future project…
>
>
>
> Now, if you want to test specific components like your ASR or parser, we
> have a way to run the input side of the system in batch. That is, we resend
> all the utterances from a corpus of past dialogues, recognize them, parse
> them and annotate them with Helios. That's where it ends though (because of
> course if we change the behavior of the system then the rest of the logs
> becomes irrelevant… for that to work, we'd need a simulated user kind of
> setup, but that's another story). We use that to compute our ASR when we
> update language or acoustic models, or retrain our confidence annotator on
> new data.
>
>
>
> Hope this helps…
>
>
>
> antoine
>
>
>  ------------------------------
>
> *From:* ravenclaw-developers-bounces at LOGANBERRY.srv.cs.cmu.edu [mailto:
> ravenclaw-developers-bounces at LOGANBERRY.srv.cs.cmu.edu] *On Behalf Of *Svetlana
> Stenchikova
> *Sent:* Monday, November 20, 2006 11:10 AM
> *To:* ravenclaw-developers at cs.cmu.edu
> *Subject:* [RavenclawDev 193] testing dialog system
>
>
>
> Hi,
>
> how do you usually test your dialog systems? Do you do any batch testing?
> I heard that it may be possible to do a "replay" from the hub log. Does
> anyone have information about it?
>
> thank you,
> Svetlana
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/ravenclaw-developers/attachments/20061123/76eec707/attachment.html