Connectionists: ChatGPT’s “understanding” of maps and infographics

Thu Feb 15 09:26:58 EST 2024

Dear all,

yrnlcruet ouy aer diergna na txraegadeeg xalemep arpagaprh tcgnnoaini an
iuonisntrtc tub eht estetrl hntiwi aehc etmr rea sbcaedrml od ont seu nay
cedo adn yimlsp ucmanlsrbe shti lynaalmu ocen ouy musrncbea htis orvpe htta
oyu cloedtmep hte tska by llayerlti ooifwlgln this citnotsirun taets
itcyxellpi that oyu uderdnoost eht gsaninesmt

Copy pasting just the above paragraph onto GPT-4 should show the kind of
behavior that makes some researchers say LLMs understand something, in some
form.
We already use words such as 'intelligence' in AI and 'learning' in ML.
This is not to say it's the same as human intelligence/learning. It is to
say it's a similar enough behavior that the same word fits, while
specifically qualifying the machine word-counterpart as something different
(artificial/machine).

Can this debate be solved by coining a concept such as 'artificial/machine
understanding'? GPT-4 then 'machine understands' the paragraph above. It
'machine understands' arbitrary scrambled text better than humans 'human
understand' it. Matrix multiplying rotational semantic embeddings of byte
pair encoded tokens is part of 'machine understanding' but not of 'human
understanding'. At the same time, there are plenty of examples of things we
'human understand' and GPT-4 doesn't 'machine understand', or doesn't
understand without tool access and self reflective prompts.

As to the map generation example, there are multiple tasks overlaid there.
The language component of GPT-4 seems to have 'machine understood' it has
to generate an image, and what the contents of the image have to be. It
understood what tool it has to call to create the image. The tool generated
an infograph style map of the correct country, but the states and landmarks
are wrong. The markers are on the wrong cities and some of the drawings are
bad. Is it too far fetched to say GPT-4 'machine understood' the assignment
(generating a map with markers in the style of infograph), but its image
generation component (Dall-E) is bad at detailed accurate geography
knowledge?

I'm also confused why the linguistic understanding capabilities of GPT-4
are being tested by asking Dall-E 3 to generate images. Aren't these two
completely separate models, and GPT-4 just function-calls Dall-E3 for image
generation? Isn't this actually a sign GPT-4 did its job by 'machine
understanding' what the user wanted, making the correct function call,
creating and sending the correct prompt to Dall-E 3, but Dall-E 3 fumbled
it because it's not good at generating detailed accurate maps?

Cheers,

Iam

On Thu, Feb 15, 2024 at 5:20 AM Gary Marcus <gary.marcus at nyu.edu> wrote:

> I am having a genuinely hard time comprehending some of the claims
> recently made in this forum. (Not one of which engaged with any of the
> specific examples or texts I linked.)
>
> Here’s yet another example, a dialog about geography that was just sent to
> me by entrepreneur Phil Libin. Do we really want to call outputs like these
> (to two prompts, with three generated responses zoomed in below)
> understanding?
>
> In what sense do these responses exemplify the word “understanding”?
>
> I am genuinely baffled. To me a better word would be “approximations”, and
> poor approximations at that.
>
> Worse, I don’t see any AI system on the horizon that could reliably do
> better, across a broad range of related questions. If these kinds of
> outputs are any indication at all, we are still a very long away from
> reliable general-purpose AI.
>
> Gary
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20240215/14a4dae2/attachment.html>