Connectionists: Annotated History of Modern AI and Deep Learning

Sun Jan 15 16:04:11 EST 2023

Thanks for these thoughts, Gary! 

1. Well, the survey is about the roots of “modern AI” (as opposed to all of AI) which is mostly driven by “deep learning.” Hence the focus on the latter and the URL "deep-learning-history.html.” On the other hand, many of the most famous modern AI applications actually combine deep learning and other cited techniques (more on this below).

Any problem of computer science can be formulated in the general reinforcement learning (RL) framework, and the survey points to ancient relevant techniques for search & planning, now often combined with NNs:

"Certain RL problems can be addressed through non-neural techniques invented long before the 1980s: Monte Carlo (tree) search (MC, 1949) [MOC1-5], dynamic programming (DP, 1953) [BEL53], artificial evolution (1954) [EVO1-7][TUR1] (unpublished), alpha-beta-pruning (1959) [S59], control theory and system identification (1950s) [KAL59][GLA85],  stochastic gradient descent (SGD, 1951) [STO51-52], and universal search techniques (1973) [AIT7].

Deep FNNs and RNNs, however, are useful tools for _improving_ certain types of RL. In the 1980s, concepts of function approximation and NNs were combined with system identification [WER87-89][MUN87][NGU89], DP and its online variant called Temporal Differences [TD1-3], artificial evolution [EVONN1-3] and policy gradients [GD1][PG1-3]. Many additional references on this can be found in Sec. 6 of the 2015 survey [DL1]. 

When there is a Markovian interface [PLAN3] to the environment such that the current input to the RL machine conveys all the information required to determine a next optimal action, RL with DP/TD/MC-based FNNs can be very successful, as shown in 1994 [TD2] (master-level backgammon player) and the 2010s [DM1-2a] (superhuman players for Go, chess, and other games). For more complex cases without Markovian interfaces, …”

Theoretically optimal planners/problem solvers based on algorithmic information theory are mentioned in Sec. 19.

2. Here a few relevant paragraphs from the intro:

"A history of AI written in the 1980s would have emphasized topics such as theorem proving [GOD][GOD34][ZU48][NS56], logic programming, expert systems, and heuristic search [FEI63,83][LEN83]. This would be in line with topics of a 1956 conference in Dartmouth, where the term "AI" was coined by John McCarthy as a way of describing an old area of research seeing renewed interest. 

Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo built the first working chess end game player [BRU1-4] (back then chess was considered as an activity restricted to the realms of intelligent creatures). AI theory dates back at least to 1931-34 when Kurt Gödel identified fundamental limits of any type of computation-based AI [GOD][BIB3][GOD21,a,b].

A history of AI written in the early 2000s would have put more emphasis on topics such as support vector machines and kernel methods [SVM1-4], Bayesian (actually Laplacian or possibly Saundersonian [STI83-85]) reasoning [BAY1-8][FI22] and other concepts of probability theory and statistics [MM1-5][NIL98][RUS95], decision trees, e.g. [MIT97], ensemble methods [ENS1-4], swarm intelligence [SW1], and evolutionary computation [EVO1-7][TUR1]. Why? Because back then such techniques drove many successful AI applications.

A history of AI written in the 2020s must emphasize concepts such as the even older chain rule [LEI07] and deep nonlinear artificial neural networks (NNs) trained by gradient descent [GD’], in particular, feedback-based recurrent networks, which are general computers whose programs are weight matrices [AC90]. Why? Because many of the most famous and most commercial recent AI applications depend on them [DL4]."

3. Regarding the future, you mentioned your hunch on neurosymbolic integration. While the survey speculates a bit about the future, it also says: "But who knows what kind of AI history will prevail 20 years from now?” 

Juergen

> On 14. Jan 2023, at 15:04, Gary Marcus <gary.marcus at nyu.edu> wrote:
> 
> Dear Juergen,
> 
> You have made a good case that the history of deep learning is often misrepresented. But, by parity of reasoning, a few pointers to a tiny fraction of the work done in symbolic AI does not in any way make this a thorough and balanced exercise with respect to the field as a whole.
> 
> I am 100% with Andrzej Wichert, in thinking that vast areas of AI such as planning, reasoning, natural language understanding, robotics and knowledge representation are treated very superficially here. A few pointers to theorem proving and the like does not solve that. 
> 
> Your essay is a fine if opinionated history of deep learning, with a special emphasis on your own work, but of somewhat limited value beyond a few terse references in explicating other approaches to AI. This would be ok if the title and aspiration didn’t aim for as a whole; if you really want the paper to reflect the field as a whole, and the ambitions of the title, you have more work to do. 
> 
> My own hunch is that in a decade, maybe much sooner, a major emphasis of the field will be on neurosymbolic integration. Your own startup is heading in that direction, and the commericial desire to make LLMs reliable and truthful will also push in that direction. 
> Historians looking back on this paper will see too little about that roots of that trend documented here.
> 
> Gary 
> 
>> On Jan 14, 2023, at 12:42 AM, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>> 
>> Dear Andrzej, thanks, but come on, the report cites lots of “symbolic” AI from theorem proving (e.g., Zuse 1948) to later surveys of expert systems and “traditional" AI. Note that Sec. 18 and Sec. 19 go back even much further in time (not even speaking of Sec. 20). The survey also explains why AI histories written in the 1980s/2000s/2020s differ. Here again the table of contents:
>> 
>> Sec. 1: Introduction
>> Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
>> Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow Learning
>> Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First Learning RNNs
>> Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
>> Sec. 6: 1965: First Deep Learning
>> Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent 
>> Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor. 
>> Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units) 
>> Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More RNNs / Etc
>> Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity / NN Online Planners
>> Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command 
>> Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with Linearized Self-Attention
>> Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training. Distilling NNs
>> Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding Gradients
>> Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets / ResNets
>> Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher 
>> Sec. 18: It's the Hardware, Stupid!
>> Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer Science
>> Sec. 20: The Broader Historic Context from Big Bang to Far Future
>> Sec. 21: Acknowledgments
>> Sec. 22: 555+ Partially Annotated References (many more in the award-winning survey [DL1])
>> 
>> Tweet: https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_SchmidhuberAI_status_1606333832956973060-3Fcxt-3DHHwWiMC8gYiH7MosAAAA&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=nWCXLKazOjmixYrJVR0CMlR12PasGbAd8bsS6VZ10bk&e= 
>> 
>> Jürgen
>> 
>> 
>> 
>> 
>> 
>>> On 13. Jan 2023, at 14:40, Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt> wrote:
>>> Dear Juergen,
>>> You make the same mistake at it was done in the earlier 1970. You identify deep learning with modern AI, the paper should be called instead "Annotated History of Deep Learning”
>>> Otherwise, you ignore symbolical AI, like search, production systems, knowledge representation, search, planning etc., as if is not part of AI anymore (suggested by your title).
>>> Best,
>>> Andreas
>>> --------------------------------------------------------------------------------------------------
>>> Prof. Auxiliar Andreas Wichert   
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__web.tecnico.ulisboa.pt_andreas.wichert_&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=h5Zy9Hk2IoWPt7me1mLhcYHEuJ55mmNOAppZKcivxAk&e=
>>> -
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amazon.com_author_andreaswichert&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=w1RtYvs8dwtfvlTkHqP_P-74ITvUW2IiHLSai7br25U&e=
>>> Instituto Superior Técnico - Universidade de Lisboa
>>> Campus IST-Taguspark
>>> Avenida Professor Cavaco Silva                 Phone: +351  214233231
>>> 2744-016 Porto Salvo, Portugal
>>>>> On 13 Jan 2023, at 08:13, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>>> Machine learning is the science of credit assignment. My new survey credits the pioneers of deep learning and modern AI (supplementing my award-winning 2015 survey):
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2212.11279&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=6E5_tonSfNtoMPw1fvFOm8UFm7tDVH7un_kbogNG_1w&e=
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__people.idsia.ch_-7Ejuergen_deep-2Dlearning-2Dhistory.html&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=XPnftI8leeqoElbWQIApFNQ2L4gDcrGy_eiJv2ZPYYk&e=
>>>> This was already reviewed by several deep learning pioneers and other experts. Nevertheless, let me know under juergen at idsia.ch if you can spot any remaining error or have suggestions for improvements.
>>>> Happy New Year!
>>>> Jürgen