Connectionists: Annotated History of Modern AI and Deep Learning

Sat Jan 21 06:28:37 EST 2023

Dear Juergen,

Thank you for your email. Some small remarks...

"3. "Neocognitron performs invariant pattern recognition, a CNN does not." CNNs and Neocognitrons (1979) have the same basic architecture with alternating convolutional and downsampling layers. Not sure why someone changed the name! See Sec. 9: https://people.idsia.ch/~juergen/deep-learning-history.html#cnn”

Neocognitron has maybe the same basic architecture as a CNN, but it performs a different task, see
Luis Sa-Couto and Andreas Wichert, Simple convolutional based models: are they learning the task or the data?, Neural Computation,1(17) , 2021 https://doi.org/10.1162/neco_a_01446

6. Generally speaking, I have never really understood the distinction between “symbolic" and “subsymbolic” AI….
All mathematics, the  calculations are related to symbol manipulation and logic.

However, I think the most important question in AI (modern AI) was posed by Douglas Hoffstadter  (Gödel, Escher, Bach  and Fluid Concepts and Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought)
- “What is a concept?”  From this key question, many others follow. How do fluid boundaries come about? How do they give rise to generalization? What makes something similar to something else? For example, what makes an uppercase letter ‘A’ recognizable as such. What is the essence of ‘A’-ness?  DL tries to find regularities from a labeled data set but cannot answer these questions.

Best Wishes,

Andrzej
---------------------------------------------------------------------------------------------------
Prof. Auxiliar Andreas Wichert   

http://web.tecnico.ulisboa.pt/andreas.wichert/
-
https://www.amazon.com/author/andreaswichert

Instituto Superior Técnico - Universidade de Lisboa
Campus IST-Taguspark 
Avenida Professor Cavaco Silva                 Phone: +351  214233231
2744-016 Porto Salvo, Portugal

> On 20 Jan 2023, at 16:28, Schmidhuber Juergen <juergen at idsia.ch> wrote:
> 
> Dear Andrzej, thanks again! A few answers:
> 
> 1. "What are the DL applications today in the industry besides some nice demos?" Alas, there are so many, on your smartphone and billions of other devices, some by the most famous companies, some of them mentioned in the survey. For example, it’s fair to say that DL has revolutionised image processing and world-wide communication across numerous languages. See, e.g., Sec. 16, or reference [DL4]: https://people.idsia.ch/~juergen/impact-on-most-valuable-companies.html
> 
> 2. "Why does a deep NN give better results than a shallow NN?” At least for RNNs that’s very clear. Compare Sec. 14 ff. https://people.idsia.ch/~juergen/deep-learning-history.html#unsupdl
> 
> 3. "Neocognitron performs invariant pattern recognition, a CNN does not." CNNs and Neocognitrons (1979) have the same basic architecture with alternating convolutional and downsampling layers. Not sure why someone changed the name! See Sec. 9: https://people.idsia.ch/~juergen/deep-learning-history.html#cnn
> 
> 4. "according to you (the title of your review) AI is today DL." No, it isn’t, as also mentioned in more detail in my reply to Gary: "many of the most famous modern AI applications actually combine deep learning and other cited techniques."
> 
> 5. "biologically plausible" deep learning: Sec. 10 mentions some of the proposals since the 1980s, plus recent work, but only briefly, because so far the impact on modern AI has been negligible. 
> 
> 6.  "you missed symbolical AI." Covered by many citations since the 1940s and famous surveys since the 1960s. Many successful modern RL applications actually combine NNs and old “symbolic” techniques such as Monte Carlo Tree Search and Planning. See also Sec. 17 and my answer to Gary on the "modern AI" focus. 
> 
> Generally speaking, I have never really understood the distinction between “symbolic" and “subsymbolic” AI. Our team is probably best known for “subsymbolic” deep learning and NNs, but for many decades we have also published stuff that many would consider “symbolic.” For example, the Gödel Machine (GM, 2003, https://arxiv.org/abs/cs/0309048) is a self-referential universal problem solver making provably optimal self-improvements. It will rewrite any part of its own code as soon as it has found a proof that the rewrite is useful, where the problem-dependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. 
> 
> Of course, any NN code can be injected into the GM’s initial self-referential code. Does that make the GM “subsymbolic”? Likewise, a GM can be implemented on a recurrent NN. Does that make the RNN “symbolic”? This also ties in with what Steve and Gary have been discussing.
> 
> 7. "Open problems" and future work: see, e.g., Sec 17: "The future of RL will be about learning/composing/planning with compact spatio-temporal abstractions of complex input streams—about commonsense reasoning [MAR15] and learning to think [PLAN4-5]. How can NNs learn to represent percepts and action plans in a hierarchical manner, at multiple levels of abstraction, and multiple time scales [LEC]? We published first answers to these questions in 1990-91: self-supervised neural history compressors [UN][UN0-3] learn to represent percepts at multiple levels of abstraction and multiple time scales (see above), while end-to-end differentiable NN-based subgoal generators [HRL3][MIR] learn hierarchical action plans through gradient descent (see above). More sophisticated ways of learning to think in abstract ways were published in 1997 [AC97][AC99][AC02] and 2015-18 [PLAN4-5]." Nevertheless, much remains to be done!
> 
> Cheers,
> Jürgen
> 
> 
> 
> 
>> On 16. Jan 2023, at 12:35, Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt> wrote:
>> 
>> Dear Jurgen,
>> 
>> Again, you missed symbolical AI in your description, names like Douglas Hofstadter. Many of today’s application are driven by symbol manipulation, like diagnostic systems, route planing (GPS navigation), time table planing, object oriented programming symbolic integration and solutions of equations (Mathematica). 
>> What are the DL applications today in the industry besides some nice demos?
>> 
>> You do not indicate open problems in DL. DL is is highly biologically implausible (back propagation, LSTM), requires a lot of energy (computing power) requires a huge training sets. The black art approach of DL,  the failure of self driving cars, the question  why does a deep NN give better results than shallow NN? Maybe the biggest mistake was to replace the biological motivated algorithm of Neocognitron by back propagation without  understanding what a Neocognitron is doing. Neocognitron performs invariant pattern recognition, a CNN does not. Transformers are biologically implausible and resulted from an engineering requirement.
>> 
>> My point is that when is was a student, I wanted to do a master thesis in NN in the late eighties, and I was told that NN do not belong to AI (not even to computer science). Today if a student comes and asks that he wants to investigate problem solving by production systems, or a biologically motivated ML he will be told that this is not AI since according to you (the title of your review)  AI is today DL. In my view, DL stops the progress in AI and NN in the same way LSIP and Prolog did in the eighties.
>> 
>> Best,
>> 
>> Andrzej
>> 
>> --------------------------------------------------------------------------------------------------
>> Prof. Auxiliar Andreas Wichert   
>> 
>> http://web.tecnico.ulisboa.pt/andreas.wichert/
>> -
>> https://www.amazon.com/author/andreaswichert
>> 
>> Instituto Superior Técnico - Universidade de Lisboa
>> Campus IST-Taguspark 
>> Avenida Professor Cavaco Silva                 Phone: +351  214233231
>> 2744-016 Porto Salvo, Portugal
>> 
>>> On 15 Jan 2023, at 21:04, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>> 
>>> Thanks for these thoughts, Gary! 
>>> 
>>> 1. Well, the survey is about the roots of “modern AI” (as opposed to all of AI) which is mostly driven by “deep learning.” Hence the focus on the latter and the URL "deep-learning-history.html.” On the other hand, many of the most famous modern AI applications actually combine deep learning and other cited techniques (more on this below).
>>> 
>>> Any problem of computer science can be formulated in the general reinforcement learning (RL) framework, and the survey points to ancient relevant techniques for search & planning, now often combined with NNs:
>>> 
>>> "Certain RL problems can be addressed through non-neural techniques invented long before the 1980s: Monte Carlo (tree) search (MC, 1949) [MOC1-5], dynamic programming (DP, 1953) [BEL53], artificial evolution (1954) [EVO1-7][TUR1] (unpublished), alpha-beta-pruning (1959) [S59], control theory and system identification (1950s) [KAL59][GLA85],  stochastic gradient descent (SGD, 1951) [STO51-52], and universal search techniques (1973) [AIT7].
>>> 
>>> Deep FNNs and RNNs, however, are useful tools for _improving_ certain types of RL. In the 1980s, concepts of function approximation and NNs were combined with system identification [WER87-89][MUN87][NGU89], DP and its online variant called Temporal Differences [TD1-3], artificial evolution [EVONN1-3] and policy gradients [GD1][PG1-3]. Many additional references on this can be found in Sec. 6 of the 2015 survey [DL1]. 
>>> 
>>> When there is a Markovian interface [PLAN3] to the environment such that the current input to the RL machine conveys all the information required to determine a next optimal action, RL with DP/TD/MC-based FNNs can be very successful, as shown in 1994 [TD2] (master-level backgammon player) and the 2010s [DM1-2a] (superhuman players for Go, chess, and other games). For more complex cases without Markovian interfaces, …”
>>> 
>>> Theoretically optimal planners/problem solvers based on algorithmic information theory are mentioned in Sec. 19.
>>> 
>>> 2. Here a few relevant paragraphs from the intro:
>>> 
>>> "A history of AI written in the 1980s would have emphasized topics such as theorem proving [GOD][GOD34][ZU48][NS56], logic programming, expert systems, and heuristic search [FEI63,83][LEN83]. This would be in line with topics of a 1956 conference in Dartmouth, where the term "AI" was coined by John McCarthy as a way of describing an old area of research seeing renewed interest. 
>>> 
>>> Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo built the first working chess end game player [BRU1-4] (back then chess was considered as an activity restricted to the realms of intelligent creatures). AI theory dates back at least to 1931-34 when Kurt Gödel identified fundamental limits of any type of computation-based AI [GOD][BIB3][GOD21,a,b].
>>> 
>>> A history of AI written in the early 2000s would have put more emphasis on topics such as support vector machines and kernel methods [SVM1-4], Bayesian (actually Laplacian or possibly Saundersonian [STI83-85]) reasoning [BAY1-8][FI22] and other concepts of probability theory and statistics [MM1-5][NIL98][RUS95], decision trees, e.g. [MIT97], ensemble methods [ENS1-4], swarm intelligence [SW1], and evolutionary computation [EVO1-7][TUR1]. Why? Because back then such techniques drove many successful AI applications.
>>> 
>>> A history of AI written in the 2020s must emphasize concepts such as the even older chain rule [LEI07] and deep nonlinear artificial neural networks (NNs) trained by gradient descent [GD’], in particular, feedback-based recurrent networks, which are general computers whose programs are weight matrices [AC90]. Why? Because many of the most famous and most commercial recent AI applications depend on them [DL4]."
>>> 
>>> 3. Regarding the future, you mentioned your hunch on neurosymbolic integration. While the survey speculates a bit about the future, it also says: "But who knows what kind of AI history will prevail 20 years from now?” 
>>> 
>>> Juergen
>>> 
> 
>> On 16. Jan 2023, at 16:18, Stephen José Hanson <jose at rubic.rutgers.edu> wrote:
>> 
>> Gary, 
>> 
>> "vast areas of AI such as planning, reasoning, natural language understanding, robotics and knowledge representation are treated very superficially here"
>> 
>> As usual you are distorting the point here. What Juergen is chronicling is about WORKING AI--(the big bang aside for a moment) and I think we do agree on some of the LLM nonsense that is in a nyperbolic loop at this point. 
>> 
>> But AI from the 70s, frankly failed including NN.   Expert systems, the apex application...couldn't even suggest decent wines.
>> langauge understanding, planning etc.. please point to us what working systems are you talking about? These things are broken. Why would we try to blend broken systems with a classifier that has human to super human classification accuracy? What would it do?pick up that last 1% of error?  Explain the VGG? We don't know how these DLs work in any case... good luck on that! (see comments on this topic with Yann and Me in the recent WIAS series!)
>> 
>> Frankly, the last gasp of AI in the 70s was the US gov 5th generation response in Austin Texas--MCC.(launched in the early 80s).. after shaking down 100s of companies 1M$ a year.. and plowing all the monies into reasoning, planning and NL KRep.. oh yeah.. Doug Lenat.. who predicted every year we went down there that CYC would become intelligent in 2001! maybe 2010! I was part of the group from Bell Labs that was supposed to provide analysis and harvest the AI fiesta each year.. there was nothing. What survived of CYC, and NL and reasoning breakthroughs? There was nothing. Nothing survived this money party. 
>> 
>> So here we are where NN comes back (just as CYC was to burst into intelligence!) under rather unlikely and seemingly marginal tweeks to NN backprop algo, and works pretty much daily with breakthroughs.. ignoring LLM for the moment.. which I believe are likely to crash in on themselves.
>> 
>> Nonetheless, as you can guess, I am countering your claim: your prediction is not going to happen.. there will be no merging of symbols and NN in the near or distant future, because it would be useless.
>> 
>> Best,
>> 
>> Steve
>>> 
>>>> On 14. Jan 2023, at 15:04, Gary Marcus <gary.marcus at nyu.edu> wrote:
>>>> 
>>>> Dear Juergen,
>>>> 
>>>> You have made a good case that the history of deep learning is often misrepresented. But, by parity of reasoning, a few pointers to a tiny fraction of the work done in symbolic AI does not in any way make this a thorough and balanced exercise with respect to the field as a whole.
>>>> 
>>>> I am 100% with Andrzej Wichert, in thinking that vast areas of AI such as planning, reasoning, natural language understanding, robotics and knowledge representation are treated very superficially here. A few pointers to theorem proving and the like does not solve that. 
>>>> 
>>>> Your essay is a fine if opinionated history of deep learning, with a special emphasis on your own work, but of somewhat limited value beyond a few terse references in explicating other approaches to AI. This would be ok if the title and aspiration didn’t aim for as a whole; if you really want the paper to reflect the field as a whole, and the ambitions of the title, you have more work to do. 
>>>> 
>>>> My own hunch is that in a decade, maybe much sooner, a major emphasis of the field will be on neurosymbolic integration. Your own startup is heading in that direction, and the commericial desire to make LLMs reliable and truthful will also push in that direction. 
>>>> Historians looking back on this paper will see too little about that roots of that trend documented here.
>>>> 
>>>> Gary 
>>>> 
>>>>> On Jan 14, 2023, at 12:42 AM, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>>>> 
>>>>> Dear Andrzej, thanks, but come on, the report cites lots of “symbolic” AI from theorem proving (e.g., Zuse 1948) to later surveys of expert systems and “traditional" AI. Note that Sec. 18 and Sec. 19 go back even much further in time (not even speaking of Sec. 20). The survey also explains why AI histories written in the 1980s/2000s/2020s differ. Here again the table of contents:
>>>>> 
>>>>> Sec. 1: Introduction
>>>>> Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
>>>>> Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow Learning
>>>>> Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First Learning RNNs
>>>>> Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
>>>>> Sec. 6: 1965: First Deep Learning
>>>>> Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent 
>>>>> Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor. 
>>>>> Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units) 
>>>>> Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More RNNs / Etc
>>>>> Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity / NN Online Planners
>>>>> Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command 
>>>>> Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with Linearized Self-Attention
>>>>> Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training. Distilling NNs
>>>>> Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding Gradients
>>>>> Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets / ResNets
>>>>> Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher 
>>>>> Sec. 18: It's the Hardware, Stupid!
>>>>> Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer Science
>>>>> Sec. 20: The Broader Historic Context from Big Bang to Far Future
>>>>> Sec. 21: Acknowledgments
>>>>> Sec. 22: 555+ Partially Annotated References (many more in the award-winning survey [DL1])
>>>>> 
>>>>> Tweet: https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_SchmidhuberAI_status_1606333832956973060-3Fcxt-3DHHwWiMC8gYiH7MosAAAA&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=nWCXLKazOjmixYrJVR0CMlR12PasGbAd8bsS6VZ10bk&e= 
>>>>> 
>>>>> Jürgen
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 13. Jan 2023, at 14:40, Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt> wrote:
>>>>>> Dear Juergen,
>>>>>> You make the same mistake at it was done in the earlier 1970. You identify deep learning with modern AI, the paper should be called instead "Annotated History of Deep Learning”
>>>>>> Otherwise, you ignore symbolical AI, like search, production systems, knowledge representation, search, planning etc., as if is not part of AI anymore (suggested by your title).
>>>>>> Best,
>>>>>> Andreas
>>>>>> --------------------------------------------------------------------------------------------------
>>>>>> Prof. Auxiliar Andreas Wichert   
>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__web.tecnico.ulisboa.pt_andreas.wichert_&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=h5Zy9Hk2IoWPt7me1mLhcYHEuJ55mmNOAppZKcivxAk&e=
>>>>>> -
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amazon.com_author_andreaswichert&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=w1RtYvs8dwtfvlTkHqP_P-74ITvUW2IiHLSai7br25U&e=
>>>>>> Instituto Superior Técnico - Universidade de Lisboa
>>>>>> Campus IST-Taguspark
>>>>>> Avenida Professor Cavaco Silva                 Phone: +351 214233231
>>>>>> 2744-016 Porto Salvo, Portugal
>>>>>>>> On 13 Jan 2023, at 08:13, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>>>>>> Machine learning is the science of credit assignment. My new survey credits the pioneers of deep learning and modern AI (supplementing my award-winning 2015 survey):
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2212.11279&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=6E5_tonSfNtoMPw1fvFOm8UFm7tDVH7un_kbogNG_1w&e=
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__people.idsia.ch_-7Ejuergen_deep-2Dlearning-2Dhistory.html&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=XPnftI8leeqoElbWQIApFNQ2L4gDcrGy_eiJv2ZPYYk&e=
>>>>>>> This was already reviewed by several deep learning pioneers and other experts. Nevertheless, let me know under juergen at idsia.ch if you can spot any remaining error or have suggestions for improvements.
>>>>>>> Happy New Year!
>>>>>>> Jürgen