<html>

<head>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">

<meta name=Generator content="Microsoft Word 11 (filtered)">

<style>

<!--

 /* Font Definitions */

 @font-face

        {font-family:PMingLiU;

        panose-1:2 2 3 0 0 0 0 0 0 0;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

@font-face

        {font-family:"\@PMingLiU";

        panose-1:2 2 3 0 0 0 0 0 0 0;}

 /* Style Definitions */

 p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman";}

a:link, span.MsoHyperlink

        {color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {font-family:Arial;

        color:navy;}

@page Section1

        {size:595.3pt 841.9pt;

        margin:70.85pt 70.85pt 70.85pt 70.85pt;}

div.Section1

        {page:Section1;}

-->

</style>

</head>

<body lang=FR link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:

10.0pt;font-family:Arial;color:navy'>Hi Svetlana,</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:

10.0pt;font-family:Arial;color:navy'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>It depends what you want

to test. If it&#8217;s the DM or the whole system, at this point we don&#8217;t

have a batch testing set up. </span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>What I do is first

(during development/debugging phase) use the TTY interface to explore as many dialogue

paths as possible (or specifically the one that crashed your system). Then, on

major updates, we have a stress test on the telephone system: we get everyone

from the Let&#8217;s Go team to call as much as possible during one hour. That

invariably uncovers more bugs&#8230; Then of course, we still have bugs after

that, which we only catch by monitoring the live system (that&#8217;s one

advantage of Let&#8217;s Go, we get enough calls per day that we are usually

able to catch a fatal bug in a couple of days)&#8230; As you see, it&#8217;s

not very automated&#8230; There have been talks of implementing some kind of

regression testing (i.e. batch from previous hub logs or something like that)

but at this point it&#8217;s still at the stage of future project&#8230;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>Now, if you want to test

specific components like your ASR or parser, we have a way to run the input

side of the system in batch. That is, we resend all the utterances from a

corpus of past dialogues, recognize them, parse them and annotate them with

Helios. That&#8217;s where it ends though (because of course if we change the behavior

of the system then the rest of the logs becomes irrelevant&#8230; for that to

work, we&#8217;d need a simulated user kind of setup, but that&#8217;s another

story). We use that to compute our ASR when we update language or acoustic

models, or retrain our confidence annotator on new data.</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>Hope this helps&#8230;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>antoine</span></font></p>

<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US

style='font-size:10.0pt;font-family:Arial;color:navy'>&nbsp;</span></font></p>

<div>

<div class=MsoNormal align=center style='text-align:center'><font size=3

face="Times New Roman"><span lang=EN-US style='font-size:12.0pt'>

<hr size=2 width="100%" align=center tabindex=-1>

</span></font></div>

<p class=MsoNormal><b><font size=2 face=Tahoma><span lang=EN-US

style='font-size:10.0pt;font-family:Tahoma;font-weight:bold'>From:</span></font></b><font

size=2 face=Tahoma><span lang=EN-US style='font-size:10.0pt;font-family:Tahoma'>

ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu

[mailto:ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu] <b><span

style='font-weight:bold'>On Behalf Of </span></b>Svetlana Stenchikova<br>

<b><span style='font-weight:bold'>Sent:</span></b> Monday, November 20, 2006

11:10 AM<br>

<b><span style='font-weight:bold'>To:</span></b>

ravenclaw-developers@cs.cmu.edu<br>

<b><span style='font-weight:bold'>Subject:</span></b> [RavenclawDev 193]

testing dialog system</span></font></p>

</div>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:

12.0pt'>&nbsp;</span></font></p>

<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:

12.0pt'>Hi,<br>

<br>

how do you usually test your dialog systems? Do you do any batch testing?<br>

I heard that it may be possible to do a &quot;replay&quot; from the hub log.

Does anyone have information about it?<br>

<br>

thank you,<br>

Svetlana</span></font></p>

</div>

</body>

</html>