[RavenclawDev 194] Re: testing dialog system
Antoine Raux
antoine at cs.cmu.edu
Mon Nov 20 11:39:31 EST 2006
Hi Svetlana,
It depends what you want to test. If it's the DM or the whole system, at
this point we don't have a batch testing set up.
What I do is first (during development/debugging phase) use the TTY
interface to explore as many dialogue paths as possible (or specifically the
one that crashed your system). Then, on major updates, we have a stress test
on the telephone system: we get everyone from the Let's Go team to call as
much as possible during one hour. That invariably uncovers more bugs. Then
of course, we still have bugs after that, which we only catch by monitoring
the live system (that's one advantage of Let's Go, we get enough calls per
day that we are usually able to catch a fatal bug in a couple of days). As
you see, it's not very automated. There have been talks of implementing some
kind of regression testing (i.e. batch from previous hub logs or something
like that) but at this point it's still at the stage of future project.
Now, if you want to test specific components like your ASR or parser, we
have a way to run the input side of the system in batch. That is, we resend
all the utterances from a corpus of past dialogues, recognize them, parse
them and annotate them with Helios. That's where it ends though (because of
course if we change the behavior of the system then the rest of the logs
becomes irrelevant. for that to work, we'd need a simulated user kind of
setup, but that's another story). We use that to compute our ASR when we
update language or acoustic models, or retrain our confidence annotator on
new data.
Hope this helps.
antoine
_____
From: ravenclaw-developers-bounces at LOGANBERRY.srv.cs.cmu.edu
[mailto:ravenclaw-developers-bounces at LOGANBERRY.srv.cs.cmu.edu] On Behalf Of
Svetlana Stenchikova
Sent: Monday, November 20, 2006 11:10 AM
To: ravenclaw-developers at cs.cmu.edu
Subject: [RavenclawDev 193] testing dialog system
Hi,
how do you usually test your dialog systems? Do you do any batch testing?
I heard that it may be possible to do a "replay" from the hub log. Does
anyone have information about it?
thank you,
Svetlana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.srv.cs.cmu.edu/pipermail/ravenclaw-developers/attachments/20061120/fc0fc2a7/attachment-0001.html
More information about the Ravenclaw-developers
mailing list