Connectionists: Annotated History of Modern AI and Deep Learning: Neural models of attention, learning, and prediction for AI

Mon Jan 16 09:14:53 EST 2023

Dear Andrzej, Juergen et al.,

While other approaches to AI than Deep Learning are being discussed, I would like to mention neural network models of attention, learning, classification, and prediction, as well as of perception, cognition, emotion, and goal-oriented action, that could profitably be considered part of modern AI.

My Magnum Opus about how our brains make our minds:

Conscious MIND, Resonant BRAIN: How Each Brain Makes a Mind

https://www.amazon.com/Conscious-Mind-Resonant-Brain-Makes/dp/0190070552

provides a self-contained overview of many of these contributions.

I am happy and honored to report that the book has won the 2022 PROSE book award in Neuroscience from the Association of American Publishers.

Among other things, the book explains how, where in our brains, and why from a deep computational perspective, we can consciously see, hear, feel, and know things about the world, and use these conscious representations to effectively plan and act to acquire valued goals.

These results hereby offer a rigorous solution of the classical mind-body problem.

More generally, the book provides a self-contained and non-technical synthesis of many of the processes whereby our brains make our minds, in both health and disease.

These neural models equally well can be combined to design autonomous adaptive intelligence into algorithms and mobile robots for engineering, technology, and AI, and many of them have already found their way into large-scale applications, notably applications that require fast incremental learning and prediction of non-stationary databases.

The book shows that Adaptive Resonance Theory, or ART, is the currently most advanced cognitive and neural theory that explains how humans learn to attend, recognize, and predict objects and events in a changing world that is filled with unexpected events.

ART overcomes serious foundational problems of back propagation and Deep Learning, including the fact that they are unreliable (because they can experience catastrophic forgetting) and untrustworthy (because they are not explainable).

In particular, even if Deep Learning makes a successful prediction in one situation, one does not know why it did so, and cannot depend upon it making a successful prediction in related situations. It should therefore never be used in applications with life-or-death consequences, such as medical and financial applications.

Why should anyone believe in ART? There are several kinds of reasons:

All the foundational hypotheses of ART have been supported by subsequent psychological and neurobiological experiments.

ART has also provided principled and unifying explanations and predictions of hundreds of other experimental facts.

 ART can, moreover, be derived from a THOUGHT EXPERIMENT

about how ANY system can learn to autonomously correct predictive errors in a changing world that is filled with unexpected events.

The hypotheses on which this thought experiment is based are familiar facts that we all know about from daily life. They are familiar because they represent ubiquitous evolutionary pressures on the evolution of our brains. When a few such familiar facts are applied together, these mutual constraints lead uniquely to ART.

Nowhere during the thought experiment are the words mind or brain mentioned.

ART hereby proposes a UNIVERSAL class of solutions of the problem of autonomous error correction and prediction in a changing world that is filled with unexpected events.

The CogEM (Cognitive-Emotional-Motor) model of how cognition and emotion interact can also be derived from a thought experiment. CogEM proposes explanations of many data about cognitive-emotional interactions.

Combining ART and CogEM shows how knowledge and value-based costs can be combined to focus attention upon knowledge and actions that have a high probability of realizing valued goals.

Remarkably, the combination of ART and CogEM also leads to the results on consciousness, because they naturally emerge from an analysis of how we can quickly LEARN about a changing world without experiencing catastrophic forgetting. I have called this a solution of the stability-plasticity dilemma;  namely how we learn quickly (plasticity) without experiencing catastrophic forgetting (stability).

Back propagation and Deep Learning cannot solve any of these problems. One reason is that they are defined by a feedforward adaptive filter. They include no cell activations, or short-term memory (STM) traces, and no top-down learning and attentive matching.

In contrast, a good enough match of an ART learned top-down expectation with a bottom-up feature pattern triggers a bottom-up and top-down resonance that chooses and focuses attention upon the CRITICAL FEATURE PATTERNS that are sufficient to predict valued outcomes, while suppressing predictively irrelevant features. These critical features are the ones that are learned by bottom-up adaptive filters and top-down expectations.

The selectivity of attention and learning is how the stability-plasticity dilemma is solved.

The learned top-down expectations obey the ART Matching Rule. They are embodied by a top-down, modulatory on-center, off-surround network whose cells obey mass action, or shunting, laws. These laws model the membrane equations of neurophysiology.

The ART Matching Rule has been supported by psychological, anatomical, neurophysiological, biophysical, and even biochemical data in multiple species, including bats.

The above resonance is called a feature-category resonance. My book summarizes six different resonances, with different functions, that occur in different parts of our brains:

TYPE OF RESONANCE

TYPE OF CONSCIOUSNESS

surface-shroud

see visual object or scene

feature- category

recognize visual object or scene

stream-shroud

hear auditory object or stream

spectral-pitch-and-timbre

recognize auditory object or stream

item-list

recognize speech and language

cognitive- emotional

feel emotion and know its source

As the above Table suggests, the book also summarizes many results about speech and language learning, cognitive planning, and performance, notably the role of the prefrontal cortex in choosing, storing, learning, and controlling the event sequences that provide predictive contexts for realizing many of the higher-order processes that together realize human intelligence.

When we compare AI with National Intelligence (NI), we might also hope that NI will shed some light on deeper aspects of the human condition. To this end, the book summarizes the following kinds of results:

The models clarify how normal brain dynamics can break down in specific and testable ways to cause behavioral symptoms of multiple mental disorders, including Alzheimer's disease, autism, amnesia, schizophrenia, PTSD, ADHD, visual and auditory agnosia and neglect, and disorders of slow-wave sleep.

Its exposition of how our brains consciously see enable the book to explain how many visual artists, including Matisse, Monet, and Seurat, as well as the Impressionists and Fauvists in general, achieved the aesthetic effects in their paintings and how humans consciously see these paintings.

The book goes beyond such purely scientific topics to clarify how our brains support such vital human qualities as creativity, morality, and religion, and how so many people can persist in superstitious, irrational, and self-defeating behaviors in certain social environments.

Many other topics are discussed in the book's Preface and 17 chapters:

https://academic.oup.com/book/40038

That makes the book a flexible resource in many kinds of courses and seminars, as some of its reviewers have noted.

In case the above comments may interest some of you in learning more, let me add that I wrote the book to be self-contained and non-technical in a conversational style so that even people who know no science can enjoy reading parts of it, no less than students and researchers in multiple disciplines.

In fact, friends of mine who know no science have been reading it, including a rabbi, pastor, visual artist, gallery owner, social worker, and lawyer.

I also priced it to be affordable. Given that it is an almost 800 double-column page book with over 600 color figures, the book could have cost well over $100 dollars. Instead, the book costs around $33 for the hard copy and around $19 for the Kindle version because I subsidized the cost with thousands of dollars of my personal funds.

I did that so that faculty and students who might want to read it could afford to do so.

For people who want all the bells and whistles of this line of work up to the present time, there are videos of several of my keynote lectures and around 560 downloadable archival articles on my web page sites.bu.edu/steveg .

If any of you do read parts of the book or research articles, please feel free to send along any comments or questions that may arise when you do.

Best wishes to all in the New Year,

Steve
________________________________
From: Connectionists <connectionists-bounces at mailman.srv.cs.cmu.edu> on behalf of Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt>
Sent: Monday, January 16, 2023 4:35 AM
To: Schmidhuber Juergen <juergen at idsia.ch>
Cc: connectionists at cs.cmu.edu <connectionists at cs.cmu.edu>
Subject: Re: Connectionists: Annotated History of Modern AI and Deep Learning

Dear Jurgen,

Again, you missed symbolical AI in your description, names like Douglas Hofstadter. Many of today’s application are driven by symbol manipulation, like diagnostic systems, route planing (GPS navigation), time table planing, object oriented programming symbolic integration and solutions of equations (Mathematica).
What are the DL applications today in the industry besides some nice  demos?

You do not indicate open problems in DL. DL is is highly biologically implausible (back propagation, LSTM), requires a lot of energy (computing power) requires a huge training sets. The black art approach of DL,  the failure of self driving cars, the question  why does a deep NN give better results than shallow NN? Maybe the biggest mistake was to replace the biological motivated algorithm of Neocognitron by back propagation without  understanding what a Neocognitron is doing. Neocognitron performs invariant pattern recognition, a CNN does not. Transformers are biologically implausible and resulted from an engineering requirement.

My point is that when is was a student, I wanted to do a master thesis in NN in the late eighties, and I was told that NN do not belong to AI (not even to computer science). Today if a student comes and asks that he wants to investigate problem solving by production systems, or a biologically motivated ML he will be told that this is not AI since according to you (the title of your review)  AI is today DL. In my view, DL stops the progress in AI and NN in the same way LSIP and Prolog did in the eighties.

Best,

Andrzej

--------------------------------------------------------------------------------------------------
Prof. Auxiliar Andreas Wichert

http://web.tecnico.ulisboa.pt/andreas.wichert/
-
https://www.amazon.com/author/andreaswichert

Instituto Superior Técnico - Universidade de Lisboa
Campus IST-Taguspark
Avenida Professor Cavaco Silva                 Phone: +351  214233231
2744-016 Porto Salvo, Portugal

> On 15 Jan 2023, at 21:04, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>
> Thanks for these thoughts, Gary!
>
> 1. Well, the survey is about the roots of “modern AI” (as opposed to all of AI) which is mostly driven by “deep learning.” Hence the focus on the latter and the URL "deep-learning-history.html.” On the other hand, many of the most famous modern AI applications actually combine deep learning and other cited techniques (more on this below).
>
> Any problem of computer science can be formulated in the general reinforcement learning (RL) framework, and the survey points to ancient relevant techniques for search & planning, now often combined with NNs:
>
> "Certain RL problems can be addressed through non-neural techniques invented long before the 1980s: Monte Carlo (tree) search (MC, 1949) [MOC1-5], dynamic programming (DP, 1953) [BEL53], artificial evolution (1954) [EVO1-7][TUR1] (unpublished), alpha-beta-pruning (1959) [S59], control theory and system identification (1950s) [KAL59][GLA85],  stochastic gradient descent (SGD, 1951) [STO51-52], and universal search techniques (1973) [AIT7].
>
> Deep FNNs and RNNs, however, are useful tools for _improving_ certain types of RL. In the 1980s, concepts of function approximation and NNs were combined with system identification [WER87-89][MUN87][NGU89], DP and its online variant called Temporal Differences [TD1-3], artificial evolution [EVONN1-3] and policy gradients [GD1][PG1-3]. Many additional references on this can be found in Sec. 6 of the 2015 survey [DL1].
>
> When there is a Markovian interface [PLAN3] to the environment such that the current input to the RL machine conveys all the information required to determine a next optimal action, RL with DP/TD/MC-based FNNs can be very successful, as shown in 1994 [TD2] (master-level backgammon player) and the 2010s [DM1-2a] (superhuman players for Go, chess, and other games). For more complex cases without Markovian interfaces, …”
>
> Theoretically optimal planners/problem solvers based on algorithmic information theory are mentioned in Sec. 19.
>
> 2. Here a few relevant paragraphs from the intro:
>
> "A history of AI written in the 1980s would have emphasized topics such as theorem proving [GOD][GOD34][ZU48][NS56], logic programming, expert systems, and heuristic search [FEI63,83][LEN83]. This would be in line with topics of a 1956 conference in Dartmouth, where the term "AI" was coined by John McCarthy as a way of describing an old area of research seeing renewed interest.
>
> Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo built the first working chess end game player [BRU1-4] (back then chess was considered as an activity restricted to the realms of intelligent creatures). AI theory dates back at least to 1931-34 when Kurt Gödel identified fundamental limits of any type of computation-based AI [GOD][BIB3][GOD21,a,b].
>
> A history of AI written in the early 2000s would have put more emphasis on topics such as support vector machines and kernel methods [SVM1-4], Bayesian (actually Laplacian or possibly Saundersonian [STI83-85]) reasoning [BAY1-8][FI22] and other concepts of probability theory and statistics [MM1-5][NIL98][RUS95], decision trees, e.g. [MIT97], ensemble methods [ENS1-4], swarm intelligence [SW1], and evolutionary computation [EVO1-7][TUR1]. Why? Because back then such techniques drove many successful AI applications.
>
> A history of AI written in the 2020s must emphasize concepts such as the even older chain rule [LEI07] and deep nonlinear artificial neural networks (NNs) trained by gradient descent [GD’], in particular, feedback-based recurrent networks, which are general computers whose programs are weight matrices [AC90]. Why? Because many of the most famous and most commercial recent AI applications depend on them [DL4]."
>
> 3. Regarding the future, you mentioned your hunch on neurosymbolic integration. While the survey speculates a bit about the future, it also says: "But who knows what kind of AI history will prevail 20 years from now?”
>
> Juergen
>
>
>> On 14. Jan 2023, at 15:04, Gary Marcus <gary.marcus at nyu.edu> wrote:
>>
>> Dear Juergen,
>>
>> You have made a good case that the history of deep learning is often misrepresented. But, by parity of reasoning, a few pointers to a tiny fraction of the work done in symbolic AI does not in any way make this a thorough and balanced exercise with respect to the field as a whole.
>>
>> I am 100% with Andrzej Wichert, in thinking that vast areas of AI such as planning, reasoning, natural language understanding, robotics and knowledge representation are treated very superficially here. A few pointers to theorem proving and the like does not solve that.
>>
>> Your essay is a fine if opinionated history of deep learning, with a special emphasis on your own work, but of somewhat limited value beyond a few terse references in explicating other approaches to AI. This would be ok if the title and aspiration didn’t aim for as a whole; if you really want the paper to reflect the field as a whole, and the ambitions of the title, you have more work to do.
>>
>> My own hunch is that in a decade, maybe much sooner, a major emphasis of the field will be on neurosymbolic integration. Your own startup is heading in that direction, and the commericial desire to make LLMs reliable and truthful will also push in that direction.
>> Historians looking back on this paper will see too little about that roots of that trend documented here.
>>
>> Gary
>>
>>> On Jan 14, 2023, at 12:42 AM, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>>
>>> Dear Andrzej, thanks, but come on, the report cites lots of “symbolic” AI from theorem proving (e.g., Zuse 1948) to later surveys of expert systems and “traditional" AI. Note that Sec. 18 and Sec. 19 go back even much further in time (not even speaking of Sec. 20). The survey also explains why AI histories written in the 1980s/2000s/2020s differ. Here again the table of contents:
>>>
>>> Sec. 1: Introduction
>>> Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
>>> Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow Learning
>>> Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First Learning RNNs
>>> Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
>>> Sec. 6: 1965: First Deep Learning
>>> Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent
>>> Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor.
>>> Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units)
>>> Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More RNNs / Etc
>>> Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity / NN Online Planners
>>> Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command
>>> Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with Linearized Self-Attention
>>> Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training. Distilling NNs
>>> Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding Gradients
>>> Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets / ResNets
>>> Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher
>>> Sec. 18: It's the Hardware, Stupid!
>>> Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer Science
>>> Sec. 20: The Broader Historic Context from Big Bang to Far Future
>>> Sec. 21: Acknowledgments
>>> Sec. 22: 555+ Partially Annotated References (many more in the award-winning survey [DL1])
>>>
>>> Tweet: https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_SchmidhuberAI_status_1606333832956973060-3Fcxt-3DHHwWiMC8gYiH7MosAAAA&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=nWCXLKazOjmixYrJVR0CMlR12PasGbAd8bsS6VZ10bk&e=
>>>
>>> Jürgen
>>>
>>>
>>>
>>>
>>>
>>>> On 13. Jan 2023, at 14:40, Andrzej Wichert <andreas.wichert at tecnico.ulisboa.pt> wrote:
>>>> Dear Juergen,
>>>> You make the same mistake at it was done in the earlier 1970. You identify deep learning with modern AI, the paper should be called instead "Annotated History of Deep Learning”
>>>> Otherwise, you ignore symbolical AI, like search, production systems, knowledge representation, search, planning etc., as if is not part of AI anymore (suggested by your title).
>>>> Best,
>>>> Andreas
>>>> --------------------------------------------------------------------------------------------------
>>>> Prof. Auxiliar Andreas Wichert
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__web.tecnico.ulisboa.pt_andreas.wichert_&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=h5Zy9Hk2IoWPt7me1mLhcYHEuJ55mmNOAppZKcivxAk&e=
>>>> -
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amazon.com_author_andreaswichert&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=w1RtYvs8dwtfvlTkHqP_P-74ITvUW2IiHLSai7br25U&e=
>>>> Instituto Superior Técnico - Universidade de Lisboa
>>>> Campus IST-Taguspark
>>>> Avenida Professor Cavaco Silva                 Phone: +351  214233231
>>>> 2744-016 Porto Salvo, Portugal
>>>>>> On 13 Jan 2023, at 08:13, Schmidhuber Juergen <juergen at idsia.ch> wrote:
>>>>> Machine learning is the science of credit assignment. My new survey credits the pioneers of deep learning and modern AI (supplementing my award-winning 2015 survey):
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2212.11279&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=6E5_tonSfNtoMPw1fvFOm8UFm7tDVH7un_kbogNG_1w&e=
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__people.idsia.ch_-7Ejuergen_deep-2Dlearning-2Dhistory.html&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=wQR1NePCSj6dOGDD0r6B5Kn1fcNaTMg7tARe7TdEDqQ&m=oGn-OID5YOewbgo3j_HjFjI3I2N3hx-w0hoIfLR_JJsn8q5UZDYAl5HOHPY-87N5&s=XPnftI8leeqoElbWQIApFNQ2L4gDcrGy_eiJv2ZPYYk&e=
>>>>> This was already reviewed by several deep learning pioneers and other experts. Nevertheless, let me know under juergen at idsia.ch if you can spot any remaining error or have suggestions for improvements.
>>>>> Happy New Year!
>>>>> Jürgen
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20230116/6a65d50d/attachment.html>