Connectionists: Scientific Integrity, the 2021 Turing Lecture, etc.

Thu Jun 30 10:22:52 EDT 2022

So as usual clearly written but without a lot of  context.

Juregen you are part of history not a historian.

Now on the one hand I really do appreciate your attempts to revive my Stochastic delta rule as a precursor to DROP-OUT.
And I know that Geoff spent some time explaining how he thought of Drop-out at a train station or somewhere with queues
and it made him think of some sort of stochastic process at the hidden layer across the queues..etc.. and its notable that he and I had a conversation in 1990 at NIPS in Denver...and he asked me..."how did you come up with this algorithm?"  I said it seemed like a nice compromise between BP and a BolzMs..  but again there can be convergent lines of invention..

I think the thing that is important in all of this requires the author--with their own judgement (and reviewers) to determine-- what is really basic level to the research instead of superordinate..  as calculus is to Backprop..   or Sumerian counting is to calculus...etc... and fishing was to counting.. etc..

Steve

On 6/28/22 3:06 AM, Schmidhuber Juergen wrote:

After months of massive open online peer review, there is a revised version 3 of my report on the history of deep learning and on misattributions, supplementing my award-winning 2015 deep learning survey. The new version mentions (among many other things):

1. The non-learning recurrent architecture of Lenz and Ising (1920s)—later reused in Amari’s learning recurrent neural network (RNN) of 1972. After 1982, this was sometimes called the "Hopfield network."

2. Rosenblatt’s MLP (around 1960) with non-learning randomized weights in a hidden layer, and an adaptive output layer. This was much later rebranded as “Extreme Learning Machines."

3. Amari’s stochastic gradient descent for deep neural nets (1967). The implementation with his student Saito learned internal representations in MLPs at a time when compute was billions of times more expensive than today.

4. Fukushima’s rectified linear units (ReLUs, 1969) and his CNN architecture (1979).

The essential statements of the text remain unchanged as their accuracy remains unchallenged:

Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award

https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.idsia.ch%2F~juergen%2Fscientific-integrity-turing-award-deep-learning.html&data=05%7C01%7Cstephen.jose.hanson%40rutgers.edu%7C3decaa12bdf74de542c108da58d8e347%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637919985826393993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RLf5Q5awCWiEJ1yf%2Bi4ExnBctfXP55J5UJpP9ym8ujA%3D&reserved=0

Jürgen

On 25 Jan 2022, at 18:03, Schmidhuber Juergen <juergen at idsia.ch><mailto:juergen at idsia.ch> wrote:

PS: Terry, you also wrote: "Our precious time is better spent moving the field forward.” However, it seems like in recent years much of your own precious time has gone to promulgating a revisionist history of deep learning (and writing the corresponding "amicus curiae" letters to award committees). For a recent example, your 2020 deep learning survey in PNAS [S20] claims that your 1985 Boltzmann machine [BM] was the first NN to learn internal representations. This paper [BM] neither cited the internal representations learnt by Ivakhnenko & Lapa's deep nets in 1965 [DEEP1-2] nor those learnt by Amari’s stochastic gradient descent for MLPs in 1967-1968 [GD1-2]. Nor did your recent survey [S20] attempt to correct this as good science should strive to do. On the other hand, it seems you celebrated your co-author's birthday in a special session while you were head of NeurIPS, instead of correcting these inaccuracies and celebrating the true pioneers of deep learning, such a!

 s Ivakhnenko and Amari. Even your recent interview https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.paperspace.com%2Fterry-sejnowski-boltzmann-machines%2F&data=05%7C01%7Cstephen.jose.hanson%40rutgers.edu%7C3decaa12bdf74de542c108da58d8e347%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637919985826393993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7eGcG9Q6c5bPJ2I5RaFHavcatp7yjkLs2unJGfTcEp8%3D&reserved=0 claims: "Our goal was to try to take a network with multiple layers - an input layer, an output layer and layers in between – and make it learn. It was generally thought, because of early work that was done in AI in the 60s, that no one would ever find such a learning algorithm because it was just too mathematically difficult.” You wrote this although you knew exactly that such learning algorithms were first created in the 1960s, and that they worked. You are a well-known scientist, head of NeurIPS, and chief editor of a major journal. You must correct this. We must all be better than this as scientists. We owe it to both the past, present, and future scientists as well as those we ultimately serve.

The last paragraph of my report https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.idsia.ch%2F~juergen%2Fscientific-integrity-turing-award-deep-learning.html&data=05%7C01%7Cstephen.jose.hanson%40rutgers.edu%7C3decaa12bdf74de542c108da58d8e347%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637919985826393993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RLf5Q5awCWiEJ1yf%2Bi4ExnBctfXP55J5UJpP9ym8ujA%3D&reserved=0 quotes Elvis Presley: "Truth is like the sun. You can shut it out for a time, but it ain't goin' away.” I wonder how the future will reflect on the choices we make now.

Jürgen

On 3 Jan 2022, at 11:38, Schmidhuber Juergen <juergen at idsia.ch><mailto:juergen at idsia.ch> wrote:

Terry, please don't throw smoke candles like that!

This is not about basic math such as Calculus (actually first published by Leibniz; later Newton was also credited for his unpublished work; Archimedes already had special cases thereof over 2000 years ago; the Indian Kerala school made essential contributions around 1400). In fact, my report addresses such smoke candles in Sec. XII: "Some claim that 'backpropagation' is just the chain rule of Leibniz (1676) & L'Hopital (1696).' No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970 [BP1]."

You write: "All these threads will be sorted out by historians one hundred years from now." To answer that, let me just cut and paste the last sentence of my conclusions: "However, today's scientists won't have to wait for AI historians to establish proper credit assignment. It is easy enough to do the right thing right now."

You write: "let us be good role models and mentors" to the new generation. Then please do what's right! Your recent survey [S20] does not help. It's mentioned in my report as follows: "ACM seems to be influenced by a misleading 'history of deep learning' propagated by LBH & co-authors, e.g., Sejnowski [S20] (see Sec. XIII). It goes more or less like this: 'In 1969, Minsky & Papert [M69] showed that shallow NNs without hidden layers are very limited and the field was abandoned until a new generation of neural network researchers took a fresh look at the problem in the 1980s [S20].' However, as mentioned above, the 1969 book [M69] addressed a 'problem' of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method [DEEP1-2][DL2] (and then also by Amari's SGD for MLPs [GD1-2]). Minsky was apparently unaware of this and failed to correct it later [HIN](Sec. I).... deep learning research was !

 alive and kicking also in the 1970s, especially outside of the Anglosphere."

Just follow ACM's Code of Ethics and Professional Conduct [ACM18] which states: "Computing professionals should therefore credit the creators of ideas, inventions, work, and artifacts, and respect copyrights, patents, trade secrets, license agreements, and other methods of protecting authors' works." No need to wait for 100 years.

Jürgen

On 2 Jan 2022, at 23:29, Terry Sejnowski <terry at snl.salk.edu><mailto:terry at snl.salk.edu> wrote:

We would be remiss not to acknowledge that backprop would not be possible without the calculus,
so Isaac newton should also have been given credit, at least as much credit as Gauss.

All these threads will be sorted out by historians one hundred years from now.
Our precious time is better spent moving the field forward.  There is much more to discover.

A new generation with better computational and mathematical tools than we had back
in the last century have joined us, so let us be good role models and mentors to them.

Terry

-----

On 1/2/2022 5:43 AM, Schmidhuber Juergen wrote:

Asim wrote: "In fairness to Jeffrey Hinton, he did acknowledge the work of Amari in a debate about connectionism at the ICNN’97 .... He literally said 'Amari invented back propagation'..." when he sat next to Amari and  Werbos. Later, however, he failed to cite Amari’s stochastic gradient descent (SGD) for multilayer NNs (1967-68) [GD1-2a] in his 2015 survey [DL3], his 2021 ACM lecture [DL3a], and other surveys.  Furthermore, SGD [STO51-52] (Robbins, Monro, Kiefer, Wolfowitz, 1951-52) is not even backprop. Backprop is just a particularly efficient way of computing gradients in differentiable networks, known as the reverse mode of automatic differentiation, due to Linnainmaa (1970) [BP1] (see also Kelley's precursor of 1960 [BPa]). Hinton did not cite these papers either, and in 2019 embarrassingly did not hesitate to accept an award for having "created ... the backpropagation algorithm” [HIN]. All references and more on this can be found in the report, especially !

 in Sec. XII.

The deontology of science requires: If one "re-invents" something that was already known, and only becomes aware of it later, one must at least clarify it later [DLC], and correctly give credit in all follow-up papers and presentations. Also, ACM's Code of Ethics and Professional Conduct [ACM18] states: "Computing professionals should therefore credit the creators of ideas, inventions, work, and artifacts, and respect copyrights, patents, trade secrets, license agreements, and other methods of protecting authors' works." LBH didn't.

Steve still doesn't believe that linear regression of 200 years ago is equivalent to linear NNs. In a mature field such as math we would not have such a discussion. The math is clear. And even today, many students are taught NNs like this: let's start with a linear single-layer NN (activation = sum of weighted inputs). Now minimize mean squared error on the training set. That's good old linear regression (method of least squares). Now let's introduce multiple layers and nonlinear but differentiable activation functions, and derive backprop for deeper nets in 1960-70 style (still used today, half a century later).

Sure, an important new variation of the 1950s (emphasized by Steve) was to transform linear NNs into binary classifiers with threshold functions. Nevertheless, the first adaptive NNs (still widely used today) are 1.5 centuries older except for the name.

Happy New Year!

Jürgen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20220630/fef8c63e/attachment.html>