<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:PMingLiU;
        panose-1:2 2 3 0 0 0 0 0 0 0;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:"\@PMingLiU";
        panose-1:2 2 3 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {font-family:Arial;
        color:navy;}
@page Section1
        {size:595.3pt 841.9pt;
        margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.Section1
        {page:Section1;}
-->
</style>
</head>
<body lang=FR link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'>Hi Svetlana,</span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'> </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>It depends what you want
to test. If it’s the DM or the whole system, at this point we don’t
have a batch testing set up. </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>What I do is first
(during development/debugging phase) use the TTY interface to explore as many dialogue
paths as possible (or specifically the one that crashed your system). Then, on
major updates, we have a stress test on the telephone system: we get everyone
from the Let’s Go team to call as much as possible during one hour. That
invariably uncovers more bugs… Then of course, we still have bugs after
that, which we only catch by monitoring the live system (that’s one
advantage of Let’s Go, we get enough calls per day that we are usually
able to catch a fatal bug in a couple of days)… As you see, it’s
not very automated… There have been talks of implementing some kind of
regression testing (i.e. batch from previous hub logs or something like that)
but at this point it’s still at the stage of future project…</span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'> </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>Now, if you want to test
specific components like your ASR or parser, we have a way to run the input
side of the system in batch. That is, we resend all the utterances from a
corpus of past dialogues, recognize them, parse them and annotate them with
Helios. That’s where it ends though (because of course if we change the behavior
of the system then the rest of the logs becomes irrelevant… for that to
work, we’d need a simulated user kind of setup, but that’s another
story). We use that to compute our ASR when we update language or acoustic
models, or retrain our confidence annotator on new data.</span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'> </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>Hope this helps…</span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'> </span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>antoine</span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'> </span></font></p>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face="Times New Roman"><span lang=EN-US style='font-size:12.0pt'>
<hr size=2 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span lang=EN-US
style='font-size:10.0pt;font-family:Tahoma;font-weight:bold'>From:</span></font></b><font
size=2 face=Tahoma><span lang=EN-US style='font-size:10.0pt;font-family:Tahoma'>
ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu
[mailto:ravenclaw-developers-bounces@LOGANBERRY.srv.cs.cmu.edu] <b><span
style='font-weight:bold'>On Behalf Of </span></b>Svetlana Stenchikova<br>
<b><span style='font-weight:bold'>Sent:</span></b> Monday, November 20, 2006
11:10 AM<br>
<b><span style='font-weight:bold'>To:</span></b>
ravenclaw-developers@cs.cmu.edu<br>
<b><span style='font-weight:bold'>Subject:</span></b> [RavenclawDev 193]
testing dialog system</span></font></p>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'> </span></font></p>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>Hi,<br>
<br>
how do you usually test your dialog systems? Do you do any batch testing?<br>
I heard that it may be possible to do a "replay" from the hub log.
Does anyone have information about it?<br>
<br>
thank you,<br>
Svetlana</span></font></p>
</div>
</body>
</html>