Connectionists: The unreasonable effectiveness of deep learning in artificial intelligence

Wed Mar 4 17:41:28 EST 2020

Sorry, Terry, but your article rehashes a misleading narrative from the 1980s which has been debunked. 

1. You write: "Deep learning was inspired by the massively parallel architecture found in brains and its origins can be traced to Frank Rosenblatt’s perceptron (5) in the 1950s.”  

However, those perceptrons or "shallow neural nets" (1958) without hidden layers were essentially like the linear regressors introduced by Legendre & Gauss around 1800 (e.g., method of least squares) [DL1]. In fact, one of the first famous examples of pattern recognition through "shallow learning" dates back over 200 years: the re-discovery of the asteroid Ceres around 1800 through Gauss, who had data points from previous observations, then used various tricks to adjust the parameters of a predictor, which essentially learned to generalise from the training data to correctly predict the new location of Ceres. (In the early 1960s, Joseph  [J61] and Rosenblatt [R62] also had a few cool ideas about deeper adaptive NNs but did not get very far.) 

2. You write:  "The great expectations in the press (Fig. 3) were dashed by Minsky and Papert (7), who showed in their book Perceptrons that a perceptron can only represent categories that are linearly separable in weight space. Although at the end of their book Minsky and Papert considered the prospect of generalizing single- to multiple-layer perceptrons, one layer feeding into the next, they doubted there would ever be a way to train these more powerful multilayer perceptrons. Unfortunately, many took this doubt to be definitive, and the field was abandoned until a new generation of neural network researchers took a fresh look at the problem in the 1980s.”

However, Minsky & Papert's 1969 book about the limitations of shallow nets addressed a “problem" that had already been solved for 4 years!  Maybe Minsky did not even know. He should have known though. Successful learning in deep architectures started in 1965 through Ivakhnenko & Lapa (in the Ukraine, back then the USSR). They published the first general, working learning algorithms for deep multilayer perceptrons with arbitrarily many layers (also with multiplicative gates) [DEEP1] [DL1]. A paper of 1971 [DEEP2] already described a deep learning net with 8 layers, trained by their highly cited method still popular in the new millennium, especially in Eastern Europe, where much of Machine Learning was born.

BTW, 5 years after Ivakhnenko & Lapa, modern backpropagation (the reverse mode of automatic differentiation) was published "next door" in Finland (1970) [BP1] [DL1]. 

3. You write: "During the ensuing neural network revival in the 1980s, Geoffrey Hinton and I introduced a learning algorithm for Boltzmann machines proving that contrary to general belief it was possible to train multilayer networks (8).” 

However,  Ivakhnenko & Lapa had functional deep learning multilayer perceptrons back in 1965. And many have used their method throughout the decades. See item 2. 

[DEEP1] Ivakhnenko, A. G. and Lapa, V. G. (1965). Cybernetic Predicting Devices. CCM Information Corporation. 
[DEEP2] Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, (4):364-378.
[R58] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386.
[J61] Joseph, R. D. (1961). Contributions to perceptron theory. PhD thesis, Cornell Univ.
[R62] Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan, New York.
[BP1] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. PDF. See also BIT 16, 146-160, 1976.
[DL1] J. Schmidhuber, 2015. Deep Learning in neural networks: An overview. Neural Networks, 61, 85-117.

Cheers,
Jürgen
http://people.idsia.ch/~juergen/2010s-our-decade-of-deep-learning.html

> On 04 Mar 2020, at 16:37, Terry Sejnowski <terry at salk.edu> wrote:
> 
> The unreasonable effectiveness of deep learning in artificial intelligence
> 
> https://www.pnas.org/content/early/2020/01/23/1907373117
> 
> Terry
> 
> -----