Connectionists: Comparing speech recognition Word Error Rates is deceiptful, please stop

Richard Loosemore rloosemore at susaro.com
Tue Dec 15 17:49:01 EST 2015


So, can I take it that no-one disagrees with this?  :-)

I have received private emails from people who say they agree with this 
analysis, but no one speaks out publicly (and that, on a mailing list 
with some ferociously opinionated correspondents, too!

If so, is this not a little .... shocking?  That no one bats an eye when 
leading researchers give the strong impression that their systems are 
"near or exceeding human performance" when in fact the truth is that 
they are ONE THOUSAND times worse than human performance?


Richard Loosemore




On 12/14/15, 11:20 AM, Richard Loosemore wrote:
>
> I just read "Deep Speech 2: End-to-End Speech Recognition in English 
> and Mandarin" by Amodei et al. ( http://arxiv.org/abs/1512.02595v1, 
> and I have finally reached the end of my tether over the reporting of 
> Word Error Rates (WER).
>
> These rates are being used to make a comparison with human performance 
> on TRANSCRIPTION of speech.  But transcription involves recognition 
> plus a complex pile of work like memory storage, time pressure, and 
> semantic paraphrasing.  And I would be willing to bet that almost all 
> the errors are in the non-recognition parts.
>
> But by using transcription error, the reported error rate for humans 
> is supposedly about 5%, and on that basis Amodei et al declare that 
> their system is now better than human.
>
> That is ludicrous.  If I give an hour-long lecture I can cram in about 
> 20,000 words, and I would be willing to bet that not one of those 
> words would be misrecognized by any of the students in my audience who 
> were actually awake.  That would be an error rate that is three orders 
> of magnitude smaller than the one for transcription.
>
> Amodei et al (and all the other deep learning speech recognition folks 
> who overinflate claiims on a regular basis):  your system is NOT 
> outperforming humans, because your system should be compared with the 
> primal recognition rate in humans, and since humans are probably about 
> 1000 times better, you have a long way to go.
>
>
> Richard Loosemore
>
>
>



More information about the Connectionists mailing list