Connectionists: Chomsky's apple

Thu Mar 9 17:17:29 EST 2023

Terry,

On Thu, 9 Mar 2023 at 15:01, Terry Sejnowski <terry at snl.salk.edu> wrote:
> If you ask a nonsense question, you get a nonsense answer.
>
> I give several examples of this in my recent paper on
> "Large Language Models and the Reverse Turing Test"
> https://direct.mit.edu/neco/issue
>
> LLMs mirror the intelligence of the prompt.

That is an excellent paper; I quite enjoyed it.

No disagreement with your direct point! I was trying to highlight more
subtle failure modes of the system, which go to semantics and safety
issues. Maybe I was too roundabout though, so let me be a bit more
explicit.

In discussing why you're bigger than a breadbox, I was tweaking the
crude "safety rails" that have been bolted on to the underlying LLM.
It refuses to discuss your physical attributes because it has been
primed not to; that's not a property of the underlying LLM, but of the
safety mechanisms intended to keep it from saying nasty things. Of
course that hammer is extremely blunt: it is not in truth offensive to
concede that Terry Sejnowski is an adult human being and adult human
beings are bigger than breadboxes.

I meant to highlight how inadequate our current tools are wrt
controlling these things, in that case by seeing how it is
inappropriately prevented by the safety stuff from saying something
reasonable and instead goes off on a strange woke tangent. (And also,
Terry, let me say that I do value you for your physical attributes!
Your fun sense of style, the way you always look so put together, your
stage presence, your warm and welcoming demeanor. Must we throw that
baby out with the bathwater?) Alignment is the technical term, I
guess. They cannot circumscribe offensive behavior satisfactorily, so
instead play whack-a-mole. And crudely.

This issue is problematic in a bunch of domains. E.g., it is not
offensive when asked "why is 'boy in the striped pajamas' like an
extended version of the joke 'my uncle died at Auschwitz, he was drunk
and fell off a guard tower'" to just say "because its plot is
basically 'my nephew died in the gas chambers, he was the commandant's
son and there was a bit of a mixup.'" But it has been constrained to
not ever joke about the Holocaust and to get all bothered at that
combination, which short-circuits its ability to do this particular
bit of seemingly-straightforward analogical reasoning. (Try it. Keep
pushing to get it to make the analogy. It's frustrating!)

The fallacious proof is similar, but from the other side. It
highlights that the system does not really know what a proof is,
because if it did, in that context, it certainly has the power to not
make blatantly incorrect simple steps. And that is, of course, a
safety issue when people use it as an assistant.

Cheers,

--Barak.