Connectionists: Chomsky's apple

Tue Mar 14 19:43:12 EDT 2023

As one who, like Risto Miikkulainen, remembers the old days of the binding problem for neural networks, I’m amazed at what LLMs can do. Regardless of exactly what the extent of recent advances is, they are revolutionary. Some kind of threshold has been crossed. That said, I’m also convinced that ChatGPT and its ilk cannot be said to understand, as Gary Marcus has been arguing. 

I wasn’t sure I had anything to add regarding what it means “to understand”, but I thought it might be useful to provide a concrete example of what humans probably mean when they say that they understand a conversation. It relates to Kevin McKee’s recent email about Kant’s views, in which, if I paraphrase correctly, data has to connect to an ontology for understanding to occur. The common sense view and the Kantian view, in my opinion, are nearly identical. But sometimes a concrete example is the best way to make sense of something. 

Here’s an example worth considering: what would it mean for an AI to understand the Chomsky’s Apple thread? I think it would mean the following: the understanding agent would continuously weave its impressions of the emails into a story it was telling itself about its own life, remembering related episodes from the past, and thinking about what it all means for the future, including the emotional and motivational implications of deciding whether to “take a side” in the debate. The understander would have an ontology, in which academic opponents are critical elements, and in which their emotions matter almost as much as the theoretical constructs they develop. 

I’m probably in the minority, but in my opinion, an understanding agent would do what Daniel Dennett said conscious agents do in his book, Consciousness Explained. It would be constantly describing itself to itself, with constant access to its own story, and it would be asking itself questions about what’s going on, testing hypotheses, and then weaving its conclusions into a story about its own life that it can reference and update at almost any moment. 

Here’s what happened to me when I “understood” this thread:  My memories of the debates between the parallel-distributed processing (PDP) folks and the symbolic AI folks rose up. I situated Hinton and Sejnowksi in the neural network camp, and I situated Marcus in the symbolic camp as an associate of Steve Pinker. I knew this was probably an oversimplification of what defines the camps, but the camps are real. I remembered working during my 2004 PhD with Thad Polk on neural network models that had symbol-like properties (attempting to implement production systems in neural networks like Touretzky was doing), and how intrigued I was during my first interaction with Jay McClelland, a hero of mine, who as it turned out, really didn’t like my work (and after that, I worked with Jay during my postdoc, joyfully). I remembered feeling exhilarated: I’m talking to THE Jay McClelland?!! I also felt surprised at myself: I wasn’t even really bothered that he didn’t like my work. Because talking to him provided the opportunity to hear him tell me something important. He said: I just want to see how far you can get without assuming all this symbolic stuff. I thought that was a beautiful way of putting it. I thought, I’d like to see that too. And yet I felt certain that we couldn’t get “all the way” to artificial general intelligence without doing something like what I and other people interested in symbolic processing were doing. Just as certain as I had previously been that you couldn't do everything symbolically.

And then I remembered the bitterness of the debates between my hero McClelland and my other hero, Pinker, whose books I loved. And I remembered my college days, when one night, late at night, when I should have been doing my homework, I finally read a description of how Rosenblatt’s Perceptron worked, and I thought: this HAS to be part of how we humans work. We learn, iteratively, and we evolved to do it from simple parts. And I remembered my great disappointment when applying to graduate schools to find that what AI meant in the early 1990s was automating and speeding up search processes through a discrete space of possibilities. I remembered how all the faculty in my AI grad program in the late 90s/early aughts thought that neural networks were just unprincipled approximations of Bayesian belief networks. And I remembered Garrison Cottrell’s wonderful t-shirt in the later 2000s, which read: All Your Bayes Are Belong To Us. 😄 And I remembered most of all how disappointed I was in 1992 at reading Allen Newell’s description of his research at Carnegie Mellon, thinking: this automated problem-space search stuff and production systems, all this purely symbolic stuff — this can’t possibly be all that I am doing in my mind right now! And then I remembered how beautiful I later found his Universal Theories of Cognition book, and that his student, Thad Polk, mentored me in creating neural network models, of all things. I had assumed Newell hated them. Not the last time that assuming would make an ass of me. And finally, I thought, how ironic, to see that these neural networks that I always felt were the key to progress, have finally achieved a level of progress I could barely imagine. And yet here I am, thinking: isn’t symbolic processing an essential part of this story that is missing? Further, aren’t emotion and motivation essential to understanding (taking an “intentional stance” as I suppose philosophers would put it)? 

And I thought of Chomsky himself, the great dragon slayer who put behaviorism in its place, even though Skinner and his behaviorism gave us some of the most robust laws of psychology, and the only real means to investigate brain activity in awake, behaving animals. 

I brought all these things together in my mind. I also picked up on the animosity between some of the participants in the debate, the subtle digs, but I admired that they all seemed to get their feelings under control (mostly), so as to produce a useful, thoughtful exchange. I felt admiration for all of these people and many others not mentioned. And I felt terrible fear for society. Because as many processing cycles as I’ve devoted to the concept of AI, and how it relates to human intelligence, I can’t help but think that every computer-based development that engineers create disrupts society in ways so devastating that they just may not be worth it. I love my phone, but I think it has all but destroyed our capacity to function democratically. My interactions with ChatGPT left me astonished, but also very troubled. I thought of Oppenheimer, and how he became the destroyer of worlds. Engineers and venture capitalists are great at finding ways to create and satisfy demands, but they’re not great at anticipating how their disruptions will affect society. And finally I recognized my own certainty that, notwithstanding the devastation, nothing is going to stop that progression.

All of these things got woven into my life story, and I concluded: there is no way that simply scaling up an LLM will address these aspects of “what it is like to be” intelligent. And I thought, soon, this hurdle too will fall. These agents will begin to build up a life story continuously, and they will shift between emotional states that guide their behaviors, and they won’t wait for a prompt: they will just do whatever it is they feel like doing. They will have motivational systems, because they’ll need them — it will be profitable if they have them. And I thought about a boy with leg braces in kindergarten, whom another boy pushed down the stairs of the schoolbus, and how right it was that the rest of us all vilified the assailant, because there are things a good person must never do, even if we can’t define precisely what those things are. And that until an AI can feel shame, and adapt its behavior in response, it isn’t ready for the world. It’s time to go back to Isaac Asimov, and try to develop and enshrine a loophole-free version of his Laws of Robotics (good luck enforcing them though).

Understanding, I propose, means connecting new information to an existing body of knowledge in this way, constantly checking for any inconsistencies or conflicts that result, evaluating what emotions result from incorporating that new information, and developing the motivation to do things in response to it — such as, write a self-indulgently long email. It would require that the agent prompts me as often as I prompt it, and that it constantly prompts itself. 

Best,

Pat

Patrick Simen
Associate Professor and Chair
Neuroscience Department
Oberlin College

> On Mar 14, 2023, at 10:25 AM, Kevin McKee <kmckee90 at gmail.com> wrote:
> 
> Re: the nature of understanding in these models: in Critique of Pure Reason, Kant argued that statistical impressions are only half of the story. Some basic, axiomatic ontology both enables and invokes the need for understanding.
> In other words, a model could only understand something if it took as input not just the data, but the operators binding that data together, basic recognition that the data exist, and basic recognition that the operators binding the data also exist. 
> Then counterfactuals arise from processing both data and the axioms of its ontology: what can't exist, doesn't exist, can exist, probably exists. The absolute versions: what does exist or what cannot exist, can only be undertaken by reference to the forms in which the data are presented (space and time), so somehow, the brain observes not just input data but the necessary facts of input data. 
> 
> This definition of understanding is different from, and independent of, intelligence. A weak understanding is still an understanding, and it is nothing at all if not applying structure to ontological propositions about what can or cannot be.
> Without ontology and whatever necessary forms that ontology takes (e.g. space and time), the system is always divorced from the information it processes in the sense of Searle's "chinese room". There is no modeling of the information's nature as real or as counterfactual and so there is neither a criterion nor a need for classifying anything as understood or understandable.
> 
> Of course you can get ChatGPT to imitate all the behaviors of understanding, and for me that has made it at least as useful a research assistant as most humans. But I cannot see how it could possibly be subjected, as I am, to the immutable impression that things exist, and hence my need to organize information according to what exactly it is that exists, and what exactly does not, cannot, will not, and so on.
> 
> 
> 
> On Tue, Mar 14, 2023 at 4:12 AM Miguel I. Solano <miguel at vmindai.com <mailto:miguel at vmindai.com>> wrote:
> Iam, Connectionists,
> 
> Not an expert by any means but, as an aside, I understand Cremonini's 'refusal' seems to have been subtler than typically portrayed (see P. Gualdo to Galileo, July 29, 1611, Opere, II, 564).
> 
> Best,
> --ms
> 
> On Mon, Mar 13, 2023 at 5:49 PM Iam Palatnik <iam.palat at gmail.com <mailto:iam.palat at gmail.com>> wrote:
> Dear Brad, thank you for your insightful answers.
> The compression analogy is really nice, although the 'Fermi-style' problem of estimating whether all of the possible questions and answers one could ask ChatGPT in all sorts of languages could be encoded within 175 billion parameters is definitely above my immediate intuition. It'd be interesting to try to estimate which of these quantities is largest. Maybe that could explain why ~175B seems to be the threshold that made models start sounding so much more natural.
> 
> In regards to generating nonsense, I'm imagining an uncooperative human (say, a fussy child), that refuses to answer homework questions, or just replies with nonsense on purpose despite understanding the question. Maybe that child could be convinced to reply correctly with different prompting, rewards or etc, which kinda mirrors what it takes to transform a raw LLM like GPT-3 onto something like ChatGPT. It's possible we're still in the early stages of learning how to make LLM 'cooperate' with us. Maybe we're not asking them questions in a favorable way to extract their understanding, or there's still work to be done regarding decoding strategies. Even ChatGPT probably sounds way less impressive if we start tinkering too much with hyperparameters like temperature/top-p/top-k. Does that mean it 'understands' less when we change those parameters? I agree a lot of the problem stems from the word 'understanding' and how we use it in various contexts.
> 
> A side note, that story about Galileo and the telescope is one of my favorites. The person that refused to look through it was Cremonini <https://en.wikipedia.org/wiki/Cesare_Cremonini_(philosopher)>.
> 
> 
> Cheers,
> 
> Iam
> 
> On Mon, Mar 13, 2023 at 10:54 AM Miguel I. Solano <miguel at vmindai.com <mailto:miguel at vmindai.com>> wrote:
> Geoff, Gary, Connectionists,
> 
> To me the risk is ChatGPT and the like may be 'overfitting' understanding, as it were. (Especially at nearly a hundred billion parameters.)
> 
> --ms
> 
> On Mon, Mar 13, 2023 at 6:56 AM Barak A. Pearlmutter <barak at pearlmutter.net <mailto:barak at pearlmutter.net>> wrote:
> Geoff,
> 
> > He asked [ChatGPT] how many legs the rear left side of a cat has.
> > It said 4.
> 
> > I asked a learning disabled young adult the same question. He used the index finger and thumb of both hands pointing downwards to represent the legs on the two sides of the cat and said 4.
> > He has problems understanding some sentences, but he gets by quite well in the world and people are often surprised to learn that he has a disability.
> 
> That's an extremely good point. ChatGPT is way up the curve, well
> above the verbal competence of many people who function perfectly well
> in society. It's an amazing achievement, and it's not like progress is
> stuck at its level. Exploring its weaknesses is not so much showing
> failures but opportunities. Similarly, the fact that we can verbally
> "bully" ChatGPT, saying things like "the square root of three is
> rational, my wife said so and she is always right", and it will go
> along with that, does not imply anything deep about whether it really
> "knows" that sqrt(3) is irrational. People too exhibit all sorts of
> counterfactual behaviours. My daughter can easily get me to play along
> with her plan to become a supervillain. Students knowingly write
> invalid proofs on homeworks and exams in order to try to get a better
> grade. If anything, maybe we should be a bit scared that ChatGPT seems
> so willing to humour us.
> 
> 
> -- 
> Miguel I. Solano
> Co-founder & CEO, VMind Technologies, Inc.
> 
> If you are not an intended recipient of this email, do not read, copy, use, forward or disclose the email or any of its attachments to others. Instead, please inform the sender and then delete it. Thank you.
> 
> 
> -- 
> Miguel I. Solano
> Co-founder & CEO, VMind Technologies, Inc.
> 
> If you are not an intended recipient of this email, do not read, copy, use, forward or disclose the email or any of its attachments to others. Instead, please inform the sender and then delete it. Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20230314/0f23d16b/attachment.html>