<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#ecca99">

    <p><font size="+1">Despite the comprehensive feel of this it still

        appears to me to be  too focused on Back-propagation per se..

        (except for that pesky Gauss/Legendre ref--which still baffles

        me at least how this is related to a "neural network"), and at

        the same time it appears to be missing other more general

        epoch-conceptually relevant cases, say:</font></p>

    <p><font size="+1">Oliver Selfridge  and his Pandemonium model..

        which was a hierarchical feature analysis system.. which

        certainly was in the air during the Neural network learning

        heyday...in fact, Minsky cites Selfridge as one of his mentors. 

        <br>

      </font></p>

    <p><font size="+1">Arthur Samuels:  Checker playing system.. which

        learned a evaluation function from a hierarchical search. <br>

      </font></p>

    <p><font size="+1">Rosenblatt's advisor was Egon Brunswick.. who was

        a gestalt perceptual psychologist who introduced the concept

        that the world was stochastic and the the organism had to adapt

        to this variance somehow.. he called it "probabilistic

        functionalism"  which brought attention to learning, perception

        and decision theory, certainly all piece parts of what we call

        neural networks.<br>

      </font></p>

    <p><font size="+1">There are many other such examples that

        influenced or provided context for the yeasty mix that was 1940s

        and 1950s where Neural Networks  first appeared partly due to

        PItts and McCulloch which entangled the human brain with

        computation and early computers themselves.</font></p>

    <p><font size="+1">I just don't see this as didactic, in the sense

        of a conceptual view of the  multidimensional history of the

        field, as opposed to  a 1-dimensional exegesis of mathematical

        threads through various statistical algorithms.</font></p>

    <p><font size="+1">Steve<br>

      </font></p>

    <div class="moz-cite-prefix">On 12/30/21 1:03 PM, Schmidhuber

      Juergen wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:2293D07C-A5E3-4E66-9120-C14DE15239A7@supsi.ch">

      <pre class="moz-quote-pre" wrap="">Dear connectionists, 

in the wake of massive open online peer review, public comments on the connectionists mailing list [CONN21] and many additional private comments (some by well-known deep learning pioneers) helped to update and improve upon version 1 of the report. The essential statements of the text remain unchanged as their accuracy remains unchallenged. I'd like to thank everyone from the bottom of my heart for their feedback up until this point and hope everyone will be satisfied with the changes. Here is the revised version 2 with over 300 references:  

<a class="moz-txt-link-freetext" href="https://people.idsia.ch/~juergen/scientific-integrity-turing-award-deep-learning.html">https://people.idsia.ch/~juergen/scientific-integrity-turing-award-deep-learning.html</a>

In particular, Sec. II has become a brief history of deep learning up to the 1970s:

Some of the most powerful NN architectures (i.e., recurrent NNs) were discussed in 1943 by McCulloch and Pitts [MC43] and formally analyzed in 1956 by Kleene [K56] - the closely related prior work in physics by Lenz, Ising, Kramers, and Wannier dates back to the 1920s [L20][I25][K41][W45]. In 1948, Turing wrote up ideas related to artificial evolution [TUR1] and learning NNs. He failed to formally publish his ideas though, which explains the obscurity of his thoughts here. Minsky's simple neural SNARC computer dates back to 1951. Rosenblatt's perceptron with a single adaptive layer learned in 1958 [R58] (Joseph [R61] mentions an earlier perceptron-like device by Farley & Clark); Widrow & Hoff's similar Adaline learned in 1962 [WID62]. Such single-layer "shallow learning" actually started around 1800 when Gauss & Legendre introduced linear regression and the method of least squares [DL1-2] - a famous early example of pattern recognition and generalization from training data t!

 hrough a parameterized predictor is Gauss' rediscovery of the asteroid Ceres based on previous astronomical observations. Deeper multilayer perceptrons (MLPs) were discussed by Steinbuch [ST61-95] (1961), Joseph [R61] (1961), and Rosenblatt [R62] (1962), who wrote about "back-propagating errors" in an MLP with a hidden layer [R62], but did not yet have a general deep learning algorithm for deep MLPs  (what's now called backpropagation is quite different and was first published by Linnainmaa in 1970 [BP1-BP5][BPA-C]). Successful learning in deep architectures started in 1965 when Ivakhnenko & Lapa published the first general, working learning algorithms for deep MLPs with arbitrarily many hidden layers (already containing the now popular multiplicative gates) [DEEP1-2][DL1-2]. A paper of 1971 [DEEP2] already described a deep learning net with 8 layers, trained by their highly cited method which was still popular in the new millennium [DL2], especially in Eastern Europe, wher!

 e much of Machine Learning was born [MIR](Sec. 1)[R8]. LBH fai!

 led to ci

te this, just like they failed to cite Amari [GD1], who in 1967 proposed stochastic gradient descent [STO51-52] (SGD) for MLPs and whose implementation [GD2,GD2a] (with Saito) learned internal representations at a time when compute was billions of times more expensive than today (see also Tsypkin's work [GDa-b]). (In 1972, Amari also published what was later sometimes called the Hopfield network or Amari-Hopfield Network [AMH1-3].) Fukushima's now widely used deep convolutional NN architecture was first introduced in the 1970s [CNN1]. 

Jürgen

******************************

On 27 Oct 2021, at 10:52, Schmidhuber Juergen <a class="moz-txt-link-rfc2396E" href="mailto:juergen@idsia.ch"><juergen@idsia.ch></a> wrote:

Hi, fellow artificial neural network enthusiasts!

The connectionists mailing list is perhaps the oldest mailing list on ANNs, and many neural net pioneers are still subscribed to it. I am hoping that some of them - as well as their contemporaries - might be able to provide additional valuable insights into the history of the field.

Following the great success of massive open online peer review (MOOR) for my 2015 survey of deep learning (now the most cited article ever published in the journal Neural Networks), I've decided to put forward another piece for MOOR. I want to thank the many experts who have already provided me with comments on it. Please send additional relevant references and suggestions for improvements for the following draft directly to me at <a class="moz-txt-link-abbreviated" href="mailto:juergen@idsia.ch">juergen@idsia.ch</a>:

<a class="moz-txt-link-freetext" href="https://people.idsia.ch/~juergen/scientific-integrity-turing-award-deep-learning.html">https://people.idsia.ch/~juergen/scientific-integrity-turing-award-deep-learning.html</a>

The above is a point-for-point critique of factual errors in ACM's justification of the ACM A. M. Turing Award for deep learning and a critique of the Turing Lecture published by ACM in July 2021. This work can also be seen as a short history of deep learning, at least as far as ACM's errors and the Turing Lecture are concerned.

I know that some view this as a controversial topic. However, it is the very nature of science to resolve controversies through facts. Credit assignment is as core to scientific history as it is to machine learning. My aim is to ensure that the true history of our field is preserved for posterity.

Thank you all in advance for your help! 

Jürgen Schmidhuber

</pre>

    </blockquote>

    <div class="moz-signature">-- <br>

      <img src="cid:part1.B306E581.59ED4E4D@rubic.rutgers.edu"

        border="0"></div>

  </body>

</html>