<div dir="ltr"><div style="text-align:center"><div style="text-align:center"><b><font size="4"><span style="font-family:times new roman,serif">An Old Machine Learning Fable Updated</span><br></font></b></div><b><font size="4"><span style="font-family:times new roman,serif"></span></font></b></div><span style="font-family:times new roman,serif"><br></span><div><span style="font-family:times new roman,serif">A cautionary tale¹ is told to novices in machine learning.<br></span></div><div><i><br><span style="font-family:times new roman,serif"></span></i></div><div><i><span style="font-family:times new roman,serif"></span></i></div><div style="margin-left:40px"><i><span style="font-family:times new roman,serif">In days of yore, long before VLSI, researchers trained a simple neural network to detect images containing enemy tanks. The US Army collected a corpus of labeled images. Proper methods were employed: cross validation, held-out test set, etc. Performance on both the training set and the unseen test set was amazing! It even worked on very difficult images, with just a bit of a tank poking out of foliage. But the military had collected all the with-tank images on one day, and the without-tank images on another. One day was sunny, the other overcast. The network had learned to classify based on the brightness of the sky. Despite all their careful hard work, the system was worthless.</span><br></i></div><div style="margin-left:40px"><i><span style="font-family:times new roman,serif"></span></i></div><span style="font-family:times new roman,serif"><br>This story has three lessons we hope to impress upon our students.<br></span><ul><li><span style="font-family:times new roman,serif">Machine-learning systems learn what's <b>really</b> in the data, not</span><span style="font-family:times new roman,serif"> what you <b>think</b> is there.</span><span style="font-family:times new roman,serif"></span></li><li><span style="font-family:times new roman,serif">The principle of garbage-in-garbage-out applies to machine learning.</span><span style="font-family:times new roman,serif"></span></li><li><span style="font-family:times new roman,serif">We must sweat the details: examine the innards of our systems, try to understand them, always with a skeptical mindset.</span><br><span style="font-family:times new roman,serif"></span></li></ul><span style="font-family:times new roman,serif">Some recent work classifying images based on EEG recordings from human<br>viewers is a modern example of precisely the same issue. EEG<br>recordings were taken in blocks, where in each block a succession of<br>many images of the same class were shown. A map from the EEG traces<br>that followed presentation of an image to that image's class label was<br>learned.² The problem is precisely the same as in the “tanks” story:<br>just as the weather drifts slowly, making photographs taken on a sunny<br>day differ systematically from ones taken on a cloudy day, so too does<br>EEG drift. Stimuli have long-lasting effects, the subject grows<br>fatigued or changes posture, electrodes make better or worse<br>electrical contact with the scalp, heartbeat changes, the subject<br>grows chilly and shivers or warm and perspires, sources of external<br>electrical interference wax and wane, etc. Such temporal confounds<br>completely account for the published results, and with them controlled<br>away test-set performance plummets to chance³ and remains near-chance<br>even with enormously more data.⁴<br>________________<br>¹Perhaps somewhat apocryphal; see <a href="https://www.gwern.net/Tanks">https://www.gwern.net/Tanks</a><br><br>²Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, Daniela<br> Giordano, Mubarak Shah, and Nasim Souly, “Deep Learning Human Mind<br> for Automated Visual Classification”, CVPR 2017, URL<br> <a href="https://openaccess.thecvf.com/content_cvpr_2017/papers/Spampinato_Deep_Learning_Human_CVPR_2017_paper.pdf">https://openaccess.thecvf.com/content_cvpr_2017/papers/Spampinato_Deep_Learning_Human_CVPR_2017_paper.pdf</a><br> and a growing corpus of work using the same dataset.<br><br>³Ren Li, Jared S. Johansen, Hamad Ahmed, Thomas V. Ilyevsky, Ronnie B.<br> Wilbur, Hari M. Bharadwaj, and Jeffrey Mark Siskind, “The perils and<br> pitfalls of block design for EEG classification experiments”, IEEE<br> Transactions on Pattern Analysis and Machine Intelligence, in press.<br> See also “Training on the test set? An analysis of Spampinato et al.<br> [31]”, arXiv:1812.07697, Dec 2018.<br><br>⁴Hamad Ahmed, Ronnie B. Wilbur, Hari M. Bharadwaj, and Jeffrey Mark<br> Siskind, “Object classification from randomized EEG trials”,<br> arXiv:2004.06046, Apr 2020. Interestingly, even changing the design<br> so training and testing data are acquired in separate blocks (Nicolae<br> Cudlencu, Nirvana Popescu, and Marius Leordeanu, “Reading into the<br> mind’s eye: Boosting automatic visual recognition with EEG signals”,<br> Neurocomputing 386:281–92, Apr 2020, online Dec 2019, DOI:<br> 10.1016/j.neucom.2019.12.076) is shown to exhibit a similar problem<br> due to a related block-order confound.<br></span></div>