<div dir="ltr"><div>Dear Brad, </div><div>I see what you mean and I agree that this is in some way about the 'cognitive biases' (for lack of a better term) that LLMs suffer, and how they are different to our cognitive biases.</div><div><br></div><div>It's not that they can't see things at the letter level, I think, because current tokenizers can split text into tokens all the way from sub-character to combinations of full words. (For example, the English letter W gets 1 token, but Phoenician

 <span class="gmail-box">𐤔</span>


gets 4 tokens, on the <a href="https://platform.openai.com/tokenizer">gpt-4 tokenizer</a>). But token related weirdness indeed likely plays a role in LLM behavior, and is likely part of why certain behaviors seem so non-intuitive. </div><div><br></div><div>But imagine some civilization from another planet, with very precise eyes, are testing us with the 

Muller Lyer illusion example, and sometimes they prank us by making one of the lines actually just slightly longer. Is there any way for us to be sure, without tool usage? Using tools, like measuring the lines with or overlaying them on an image editor, makes the task more trivial, whereas it might've been close to impossible otherwise. Would the testers be able to conclude something about our intelligence or understanding based on the tool-less version of the Muller Lyer illusion test? We may be bad at that test in that format, but does that mean we don't understand length? So much of our species is built around tool usage.<br></div><div><br></div><div>Because the performance of the LLMs on some of these tests seem to depend so much on how the questions are formulated and what tools they are given to respond with, I still tend to think that they understand something. I'm OK with the idea that their understanding has space to be much deeper, still, too.<br></div><div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 19, 2024 at 1:50 PM Brad Wyble <<a href="mailto:bwyble@gmail.com">bwyble@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Iam, the difference is that while you may need an external source to remember all 50 states, for the ones that you have remembered/looked up, you are able to verify that they do or do not contain specific letters without reference to a resource, or writing some code to verify it.  It is even worse that if you push them on their mistakes, they are still unable to correct. <div><br></div><div>A better counterargument to the example Dave provides is that perhaps LLMs just cannot ever break things down at the letter level because of their reliance on tokens.  Humans can do this of course, but a good analogy for us might be the Muller Lyer illusion, which is essentially impenetrable to our cognitive faculties.  I.e. we are unable to force ourselves to see the lines as their true lengths on the page because the basis of our representations does not permit it.  This is perhaps similar to the way that LLM representations preclude them from accessing the letter level.   </div><div><br></div><div>However, I think a good counterpoint to this is that while people are unable to un-see the Muller Lyer illusion, it is not that difficult to teach someone about this blindspot and get them to reason around it, with no external tools, just their own reasoning faculties.  LLMs seem unable to achieve this level of self-knowledge no matter how patiently things are explained.  They do not have the metacognitive faculty that allows them to even understand their blindspot about letters. </div><div><br></div><div><br></div><div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 19, 2024 at 10:06 AM Gary Marcus <<a href="mailto:gary.marcus@nyu.edu" target="_blank">gary.marcus@nyu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"></div><div dir="ltr">Correct; also tool integration has actually been less successful than some people believe: </div><div><br></div><a href="https://open.substack.com/pub/garymarcus/p/getting-gpt-to-work-with-external?r=8tdk6&utm_campaign=post&utm_medium=web" target="_blank">https://open.substack.com/pub/garymarcus/p/getting-gpt-to-work-with-external</a><div><div dir="ltr"><br><blockquote type="cite">On Feb 19, 2024, at 5:49 AM, Thomas Trappenberg <<a href="mailto:tt@cs.dal.ca" target="_blank">tt@cs.dal.ca</a>> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="auto">Good point, but Dave's point stands as the models he is referring to did not even comprehend that they made mistakes. <div dir="auto"><br></div><div dir="auto">Cheers, Thomas</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 19, 2024, 4:43 a.m.  <<a href="mailto:wuxundong@gmail.com" target="_blank">wuxundong@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">That can be attributed to the models' underlying text encoding and processing mechanisms, specifically tokenization that removes the spelling information from those words. If you use GPT-4 instead, it can process it properly by resorting to external tools.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 19, 2024 at 3:45 PM Dave Touretzky <<a href="mailto:dst@cs.cmu.edu" rel="noreferrer" target="_blank">dst@cs.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">My favorite way to show that LLMs don't know what they're talking about<br>

is this simple prompt:<br>

<br>

   List all the US states whose names don't contain the letter "a".<br>

<br>

ChatGPT, Bing, and Gemini all make a mess of this, e.g., putting "Texas"<br>

or "Alaska" on the list and leaving out states like "Wyoming" and<br>

"Tennessee".  And you can have a lengthy conversation with them about<br>

this, pointing out their errors one at a time, and they still can't<br>

manage to get it right.  Gemini insisted that all 50 US states have an<br>

"a" in their name.  It also claimed "New Jersey" has two a's.<br>

<br>

-- Dave Touretzky<br>

</blockquote></div>

</blockquote></div>

</div></blockquote></div></div></blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Brad Wyble<br>Professor<span style="font-size:12.8px"> of Psychology</span> <br>Penn State University<div><br></div></div></div></div></div></div>

</blockquote></div>