<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    The comments in reply to my original post (... including some of the

    ones I have received offlist) are getting surreal.<br>

    <br>

    My main point was:  in the paper, humans were reported to have an

    error rate for speech recognition of one word in twenty.  If what we

    are talking about is ordinary, in-the-wild recognition of speech,

    that rate is transparently ridiculous.  Do you make a mistake

    recognizing every 20th word you hear?<br>

    <br>

    Clearly not.  The rate reported is for a human who is recognizing

    AND transcribing.<br>

    <br>

    Yes, I am appealing to common sense to make that point:  but am I

    really supposed to do a factor analysis to demonstrate that it is

    "transparently ridiculous" to suggest that humans have a 1-in-20

    error rate?   I suggested (that is all: suggested) that the real

    number for a pure recognition task was probably closer to 1 in

    20,000 or less.  Do we really have to have a debate about how

    accurate that suggestion was, or how irresponsible I am to make the

    suggestion?  It is clearly not 1 in 20, so I made a first stab at a

    better number.<br>

    <br>

    Secondly:  in BOTH the case of people naturally listening to speech

    (the lecture that I mentioned) and a transcriber trying to write

    down speech in an online task, there will be all kinds of high-level

    processing that makes a top-down contribution to the recognition

    task, so it makes no sense (Stefano Rovetta) to discount what I said

    about possible error rates of less than 1 in 20,000 when listening

    to a lecture.  Yes, you can help the recognition process by

    understanding the content of a lecture ... but so can the person

    doing transcription.<br>

    <br>

    Finally, my comments were not a specific accusation of fraud

    directed against Amodei et al., because I extended my target to "all

    the other deep learning speech recognition folks who overinflate

    claims on a regular basis".<br>

    <br>

    Here is the last paragraph of the conclusion to the Amodei et al

    paper:<br>

    <br>

    "Overall, we believe our results confirm and exemplify the value of

    end-to-end Deep Learning methods for speech recognition in several

    settings. In those cases where our system is not already comparable

    to humans, the difference has fallen rapidly, largely because of

    application-agnostic Deep Learning techniques. We believe these

    techniques will continue to scale, and thus conclude that the vision

    of a single speech system that outperforms humans in most scenarios

    is imminently achievable."<br>

    <br>

    Everything about this paragraph shouts "speech system that

    outperforms humans".<br>

    <br>

    What it does not say is "speech system that outperforms humans ....

    but only if we are talking about humans who are being overloaded by

    the need to simultaneously perform the task of transcribing the

    results of recognition". <br>

    <br>

    The computer speech recognition system finds the transcription part

    utterly trivial; the human finds it crippling.  It doesn't take a

    rocket surgeon to figure that out.<br>

    <br>

    <br>

    Richard Loosemore<br>

    <br>

    <br>

    <br>

    <meta http-equiv="Content-Type" content="text/html;

      charset=ISO-8859-1">

    <title></title>

  </body>

</html>