Connectionists: Annotated History of Modern AI and Deep Learning

Tue Jan 17 00:28:11 EST 2023

Thank you all for a great discussion, and of course Jürgen for your work on
the annotated history that has kicked it off.

For reasons tangential to all of this, I have been recently reviewing some
of the MIT Archives and found this invitation from Wiener, von Neumann, and
Aiken to several individuals for a sometimes historically overlooked 2 day
meeting that was held at Princeton in January 1945 on a "...field of
effort, which as yet is not even named."

I thought some might find this of interest.

Cheers!

Sean

On Mon, Jan 16, 2023 at 11:51 PM Gary Marcus <gary.marcus at nyu.edu> wrote:

> Hi, Juergen,
>
> Thanks for your reply.  Restricting your title to “modern” AI as you did
> is a start, but I think still not enough. For example, from what I
> understand about NNAISANCE, through talking with you and Bas Steunebrink,
> there’s quite a bit of hybrid AI in what you are doing at your company, not
> well represented in the review. The related open-access book certainly
> draws heavily on both traditions (
> https://link.springer.com/book/10.1007/978-3-031-08020-3).
>
> Likewise, there is plenty of eg symbolic planning in modern navigation
> systems, most robots etc; still plenty of use of symbolic trees in game
> playing; lots of people still use taxonomies and inheritance, etc., an
> AFAIK nobody has built a trustworthy virtual assistant, even in a narrow
> domain, with only deep learning. And so on.
>
> In the end, it’s really a question about balance, which is what I think
> Andrzej was getting at; you go miles deep on the history of deep learning,
> which I respect, but just give relatively superficial pointers (not none!)
> outside that tradition. Definitely better, to be sure, in having at least a
> few pointers than in having none, and I would agree that the future is
> uncertain. I think you strike the right note there!
>
> As an aside, saying that everything can be formulated as RL is maybe no
> more helpful than saying that everything we (currently) know how to do can
> be formulated in terms of Turing machine. True, but doesn’t carry you far
> enough in most real world applications. I personally see RL as part of an
> answer, but most useful in (and here we might partly agree) the context of
> systems with rich internal models of the world.
>
> My own view is that we will get to more reliable AI only once the field
> more fully embraces the project of articulating how such models work and
> how they are developed.
>
> Which is maybe the one place where you (eg
> https://arxiv.org/pdf/1803.10122.pdf), Yann LeCun (eg
> https://openreview.net/forum?id=BZ5a1r-kVsf), and I (eg
> https://arxiv.org/abs/2002.06177) are most in agreement.
>
> Best,
> Gary
>
> On Jan 15, 2023, at 23:04, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>
> Thanks for these thoughts, Gary!
>
> 1. Well, the survey is about the roots of “modern AI” (as opposed to all
> of AI) which is mostly driven by “deep learning.” Hence the focus on the
> latter and the URL "deep-learning-history.html.” On the other hand, many of
> the most famous modern AI applications actually combine deep learning and
> other cited techniques (more on this below).
>
> Any problem of computer science can be formulated in the general
> reinforcement learning (RL) framework, and the survey points to ancient
> relevant techniques for search & planning, now often combined with NNs:
>
> "Certain RL problems can be addressed through non-neural techniques
> invented long before the 1980s: Monte Carlo (tree) search (MC, 1949)
> [MOC1-5], dynamic programming (DP, 1953) [BEL53], artificial evolution
> (1954) [EVO1-7][TUR1] (unpublished), alpha-beta-pruning (1959) [S59],
> control theory and system identification (1950s) [KAL59][GLA85],
>  stochastic gradient descent (SGD, 1951) [STO51-52], and universal search
> techniques (1973) [AIT7].
>
> Deep FNNs and RNNs, however, are useful tools for _improving_ certain
> types of RL. In the 1980s, concepts of function approximation and NNs were
> combined with system identification [WER87-89][MUN87][NGU89], DP and its
> online variant called Temporal Differences [TD1-3], artificial evolution
> [EVONN1-3] and policy gradients [GD1][PG1-3]. Many additional references on
> this can be found in Sec. 6 of the 2015 survey [DL1].
>
> When there is a Markovian interface [PLAN3] to the environment such that
> the current input to the RL machine conveys all the information required to
> determine a next optimal action, RL with DP/TD/MC-based FNNs can be very
> successful, as shown in 1994 [TD2] (master-level backgammon player) and the
> 2010s [DM1-2a] (superhuman players for Go, chess, and other games). For
> more complex cases without Markovian interfaces, …”
>
> Theoretically optimal planners/problem solvers based on algorithmic
> information theory are mentioned in Sec. 19.
>
> 2. Here a few relevant paragraphs from the intro:
>
> "A history of AI written in the 1980s would have emphasized topics such as
> theorem proving [GOD][GOD34][ZU48][NS56], logic programming, expert
> systems, and heuristic search [FEI63,83][LEN83]. This would be in line with
> topics of a 1956 conference in Dartmouth, where the term "AI" was coined by
> John McCarthy as a way of describing an old area of research seeing renewed
> interest.
>
> Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo
> built the first working chess end game player [BRU1-4] (back then chess was
> considered as an activity restricted to the realms of intelligent
> creatures). AI theory dates back at least to 1931-34 when Kurt Gödel
> identified fundamental limits of any type of computation-based AI
> [GOD][BIB3][GOD21,a,b].
>
> A history of AI written in the early 2000s would have put more emphasis on
> topics such as support vector machines and kernel methods [SVM1-4],
> Bayesian (actually Laplacian or possibly Saundersonian [STI83-85])
> reasoning [BAY1-8][FI22] and other concepts of probability theory and
> statistics [MM1-5][NIL98][RUS95], decision trees, e.g. [MIT97], ensemble
> methods [ENS1-4], swarm intelligence [SW1], and evolutionary computation
> [EVO1-7][TUR1]. Why? Because back then such techniques drove many
> successful AI applications.
>
> A history of AI written in the 2020s must emphasize concepts such as the
> even older chain rule [LEI07] and deep nonlinear artificial neural networks
> (NNs) trained by gradient descent [GD’], in particular, feedback-based
> recurrent networks, which are general computers whose programs are weight
> matrices [AC90]. Why? Because many of the most famous and most commercial
> recent AI applications depend on them [DL4]."
>
> 3. Regarding the future, you mentioned your hunch on neurosymbolic
> integration. While the survey speculates a bit about the future, it also
> says: "But who knows what kind of AI history will prevail 20 years from
> now?”
>
> Juergen
>
>
> On 14. Jan 2023, at 15:04, Gary Marcus <gary.marcus at nyu.edu> wrote:
>
>
> Dear Juergen,
>
>
> You have made a good case that the history of deep learning is often
> misrepresented. But, by parity of reasoning, a few pointers to a tiny
> fraction of the work done in symbolic AI does not in any way make this a
> thorough and balanced exercise with respect to the field as a whole.
>
>
> I am 100% with Andrzej Wichert, in thinking that vast areas of AI such as
> planning, reasoning, natural language understanding, robotics and knowledge
> representation are treated very superficially here. A few pointers to
> theorem proving and the like does not solve that.
>
>
> Your essay is a fine if opinionated history of deep learning, with a
> special emphasis on your own work, but of somewhat limited value beyond a
> few terse references in explicating other approaches to AI. This would be
> ok if the title and aspiration didn’t aim for as a whole; if you really
> want the paper to reflect the field as a whole, and the ambitions of the
> title, you have more work to do.
>
>
> My own hunch is that in a decade, maybe much sooner, a major emphasis of
> the field will be on neurosymbolic integration. Your own startup is heading
> in that direction, and the commericial desire to make LLMs reliable and
> truthful will also push in that direction.
>
> Historians looking back on this paper will see too little about that roots
> of that trend documented here.
>
>
> Gary
>
>
> On Jan 14, 2023, at 12:42 AM, Schmidhuber Juergen <juergen at idsia.ch>
> wrote:
>
>
> Dear Andrzej, thanks, but come on, the report cites lots of “symbolic” AI
> from theorem proving (e.g., Zuse 1948) to later surveys of expert systems
> and “traditional" AI. Note that Sec. 18 and Sec. 19 go back even much
> further in time (not even speaking of Sec. 20). The survey also explains
> why AI histories written in the 1980s/2000s/2020s differ. Here again the
> table of contents:
>
>
> Sec. 1: Introduction
>
> Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
>
> Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow
> Learning
>
> Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First
> Learning RNNs
>
> Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
>
> Sec. 6: 1965: First Deep Learning
>
> Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent
>
> Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor.
>
> Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units)
>
> Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More
> RNNs / Etc
>
> Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity
> / NN Online Planners
>
> Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command
>
> Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with
> Linearized Self-Attention
>
> Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training.
> Distilling NNs
>
> Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding
> Gradients
>
> Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets /
> ResNets
>
> Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher
>
> Sec. 18: It's the Hardware, Stupid!
>
> Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer
> Science
>
> Sec. 20: The Broader Historic Context from Big Bang to Far Future
>
> Sec. 21: Acknowledgments
>
> Sec. 22: 555+ Partially Annotated References (many more in the
> award-winning survey [DL1])
>
>
> Tweet:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_SchmidhuberAI_status_1606333832956973060-3Fcxt-3DHHwWiMC8gYiH7MosAAAA&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=nWCXLKazOjmixYrJVR0CMlR12PasGbAd8bsS6VZ10bk&e=
>
>
> Jürgen
>
>
>
>
>
>
> On 13. Jan 2023, at 14:40, Andrzej Wichert <
> andreas.wichert at tecnico.ulisboa.pt> wrote:
>
> Dear Juergen,
>
> You make the same mistake at it was done in the earlier 1970. You identify
> deep learning with modern AI, the paper should be called instead "Annotated
> History of Deep Learning”
>
> Otherwise, you ignore symbolical AI, like search, production systems,
> knowledge representation, search, planning etc., as if is not part of AI
> anymore (suggested by your title).
>
> Best,
>
> Andreas
>
>
> --------------------------------------------------------------------------------------------------
>
> Prof. Auxiliar Andreas Wichert
>
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__web.tecnico.ulisboa.pt_andreas.wichert_&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=h5Zy9Hk2IoWPt7me1mLhcYHEuJ55mmNOAppZKcivxAk&e=
>
> -
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amazon.com_author_andreaswichert&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=w1RtYvs8dwtfvlTkHqP_P-74ITvUW2IiHLSai7br25U&e=
>
> Instituto Superior Técnico - Universidade de Lisboa
>
> Campus IST-Taguspark
>
> Avenida Professor Cavaco Silva                 Phone: +351  214233231
>
> 2744-016 Porto Salvo, Portugal
>
> On 13 Jan 2023, at 08:13, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>
> Machine learning is the science of credit assignment. My new survey
> credits the pioneers of deep learning and modern AI (supplementing my
> award-winning 2015 survey):
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2212.11279&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=6E5_tonSfNtoMPw1fvFOm8UFm7tDVH7un_kbogNG_1w&e=
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__people.idsia.ch_-7Ejuergen_deep-2Dlearning-2Dhistory.html&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=XPnftI8leeqoElbWQIApFNQ2L4gDcrGy_eiJv2ZPYYk&e=
>
> This was already reviewed by several deep learning pioneers and other
> experts. Nevertheless, let me know under juergen at idsia.ch if you can spot
> any remaining error or have suggestions for improvements.
>
> Happy New Year!
>
> Jürgen
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20230117/ee1dd2d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Wiener Aiken Von Neumann invite 04 Dec 1944.jpg
Type: image/jpeg
Size: 97769 bytes
Desc: not available
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20230117/ee1dd2d8/attachment.jpg>