Thank you Antoine,<br>

<br>

It seems like testing ASR in a batch is really essential. We are now

just starting to add ASR to our system and we will definitely do

something similar for the ASR testing.<br>

<br>

We are also making TTY capable of running in a batch mode in order to

test parser, DM, backend, and NLG, so that when something changes we

could quickly verify that existing functionality is not broken.<br>

<br>

<br>

Thanks,<br>

Svetlana<br>

<br><div><span class="gmail_quote">On 11/20/06, <b class="gmail_sendername">Antoine Raux</b> &lt;<a href="mailto:antoine@cs.cmu.edu">antoine@cs.cmu.edu</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div link="blue" vlink="purple" lang="FR">

<div>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">Hi Svetlana,</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;">&nbsp;</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">It depends what you want

to test. If it's the DM or the whole system, at this point we don't

have a batch testing set up. </span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">What I do is first

(during development/debugging phase) use the TTY interface to explore as many dialogue

paths as possible (or specifically the one that crashed your system). Then, on

major updates, we have a stress test on the telephone system: we get everyone

from the Let's Go team to call as much as possible during one hour. That

invariably uncovers more bugs… Then of course, we still have bugs after

that, which we only catch by monitoring the live system (that's one

advantage of Let's Go, we get enough calls per day that we are usually

able to catch a fatal bug in a couple of days)… As you see, it's

not very automated… There have been talks of implementing some kind of

regression testing (i.e. batch from previous hub logs or something like that)

but at this point it's still at the stage of future project…</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">&nbsp;</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">Now, if you want to test

specific components like your ASR or parser, we have a way to run the input

side of the system in batch. That is, we resend all the utterances from a

corpus of past dialogues, recognize them, parse them and annotate them with

Helios. That's where it ends though (because of course if we change the behavior

of the system then the rest of the logs becomes irrelevant… for that to

work, we'd need a simulated user kind of setup, but that's another

story). We use that to compute our ASR when we update language or acoustic

models, or retrain our confidence annotator on new data.</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">&nbsp;</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">Hope this helps…</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">&nbsp;</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">antoine</span></font></p>

<p><font color="navy" face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial; color: navy;" lang="EN-US">&nbsp;</span></font></p>

<div>

<div style="text-align: center;" align="center"><font face="Times New Roman" size="3"><span style="font-size: 12pt;" lang="EN-US">

<hr align="center" size="2" width="100%">

</span></font></div>

<p><b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma; font-weight: bold;" lang="EN-US">From:</span></font></b><font face="Tahoma" size="2"><span style="font-size: 10pt; font-family: Tahoma;" lang="EN-US">

<a href="mailto:ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu</a>

[mailto:<a href="mailto:ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu</a>] <b><span style="font-weight: bold;">

On Behalf Of </span></b>Svetlana Stenchikova<br>

<b><span style="font-weight: bold;">Sent:</span></b> Monday, November 20, 2006

11:10 AM<br>

<b><span style="font-weight: bold;">To:</span></b>

<a href="mailto:ravenclaw-developers@cs.cmu.edu" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">ravenclaw-developers@cs.cmu.edu</a><br>

<b><span style="font-weight: bold;">Subject:</span></b> [RavenclawDev 193]

testing dialog system</span></font></p>

</div><div><span class="e" id="q_10f06416f8a9e1ff_1">

<p><font face="Times New Roman" size="3"><span style="font-size: 12pt;">&nbsp;</span></font></p>

<p><font face="Times New Roman" size="3"><span style="font-size: 12pt;">Hi,<br>

<br>

how do you usually test your dialog systems? Do you do any batch testing?<br>

I heard that it may be possible to do a &quot;replay&quot; from the hub log.

Does anyone have information about it?<br>

<br>

thank you,<br>

Svetlana</span></font></p>

</span></div></div>

</div>

</blockquote></div><br>