Connectionists: Annotated History of Modern AI and Deep Learning

Thomas Trappenberg tt at
Sun Feb 5 03:28:50 EST 2023

My first question to ChatGPT

Thomas: what is heavier, 1kg of lead or 2000g of feathers?

ChatGPT: They both weigh the same, 1kg

This makes total sense for a high-dimensional non-causal frequency table.
Lead, feathers, heavier, same, is likely to be a frequent correlation.
2000g (around 4.41 pounds for our American friends) does not really fit in.

Cheers, Thomas

On Sun, Feb 5, 2023, 2:36 a.m. Gary Marcus <gary.marcus at> wrote:

> [image: image]
> A bit more history, on the possibility that it might be of use to future
> students of our contentious AI times, and in the spirit of the Elvis quote
> below:
> 2015: Gary Marcus writes, somewhat loosely, in a trade book (The Future of
> the Brain)
> “Hierarchies of features are less suited to challenges such as language,
> inference, and high-level planning. For example, as Noam Chomsky famously
> pointed out, language is filled with sentences you haven't seen before. Pure
> classifier systems don't know what to do with such sentences. The talent of
> feature detectors -- in  identifying which member of some category
> something belongs to -- doesn't translate into understanding novel sentences,
> in which each sentence has its own unique meaning.”
> Sometime thereafter: Turing Award winner Geoff Hinton enshrines the quote
> on his own web page, with ridicule, as “My Favorite Gary Marcus quote”;
> people in the deep learning community circulate it on Facebook and Twitter,
> mocking Marcus.
> October 2019: Geoff Hinton, based perhaps primarily on the quote, warns a
> crowd of researchers at Toronto to not waste their time listening to
> Marcus. (Hinton’s email bounces, because it was sent from the wrong
> address). Hinton’s view is that language has been solved, by Google
> Translate; in his eyes, Marcus is a moron.
> [Almost three years pass; ridicule of Marcus continues on major social
> media]
> February 2023: Hinton’s fellow Turing Award winner Yann LeCun unleashes a
> Tweetstorm, saying that “LLMs such as ChatGPT can eloquently spew
> complete nonsense. Their grasp of reality is very superficial” and that “
> [LLM] make very stupid mistakes of common-sense that a 4 year-old, a chimp,
> a dog, or a cat would never make. LLMs have a more superficial
> understanding of the world than a house cat.”
> Marcus receives many emails wondering whether LeCun has switched sides. On
> Twitter, people ask whether Marcus has hacked LeCun’s Twitter account.
> The quote from Marcus, at the bottom of Hinton’s home page, remains.
> [image: IMG_3771]
> On Feb 3, 2023, at 02:15, Schmidhuber Juergen <juergen at> wrote:
> PS: the weirdest thing is that later Minsky & Papert published a famous
> book (1969) [M69] that cited neither Amari’s SGD-based deep learning
> (1967-68) nor the original layer-by-layer deep learning (1965) by
> Ivakhnenko & Lapa [DEEP1-2][DL2].
> Minsky & Papert's book [M69] showed that shallow NNs without hidden layers
> are very limited. Duh! That’s exactly why people like Ivakhnenko & Lapa and
> Amari had earlier overcome this problem through _deep_ learning with many
> learning layers.
> Minsky & Papert apparently were unaware of this. Unfortunately, even later
> they failed to correct their book [T22].
> Much later, others took this as an opportunity to promulgate a rather
> self-serving revisionist history of deep learning [S20][DL3][DL3a][T22]
> that simply ignored pre-Minsky deep learning.
> However, as Elvis Presley put it, "Truth is like the sun. You can shut it
> out for a time, but it ain't goin' away.” [T22]
> Juergen
> On 26. Jan 2023, at 16:29, Schmidhuber Juergen <juergen at> wrote:
> And in 1967-68, the same Shun-Ichi Amari trained multilayer perceptrons
> (MLPs) with many layers by stochastic gradient descent (SGD) in end-to-end
> fashion. See Sec. 7 of the survey:
> Amari's implementation [GD2,GD2a] (with his student Saito) learned
> internal representations in a five layer MLP with two modifiable layers,
> which was trained to classify non-linearily separable pattern classes.
> Back then compute was billions of times more expensive than today.
> To my knowledge, this was the first implementation of learning internal
> representations through SGD-based deep learning.
> If anyone knows of an earlier one then please let me know :)
> Jürgen
> On 25. Jan 2023, at 16:44, Schmidhuber Juergen <juergen at> wrote:
> Some are not aware of this historic tidbit in Sec. 4 of the survey: half a
> century ago, Shun-Ichi Amari published a learning recurrent neural network
> (1972) which was later called the Hopfield network.
> Jürgen
> On 13. Jan 2023, at 11:13, Schmidhuber Juergen <juergen at> wrote:
> Machine learning is the science of credit assignment. My new survey
> credits the pioneers of deep learning and modern AI (supplementing my
> award-winning 2015 survey):
> This was already reviewed by several deep learning pioneers and other
> experts. Nevertheless, let me know under juergen at if you can spot
> any remaining error or have suggestions for improvements.
> Happy New Year!
> Jürgen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 124355 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IMG_3771.jpeg
Type: image/jpeg
Size: 820505 bytes
Desc: not available
URL: <>

More information about the Connectionists mailing list