<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">Frederick, Geoff and others,</div><div dir="ltr"><br></div><div dir="ltr">by the way I don’t know about “average people” but here is a great example of a bright journalist (not an AI expert or ML researcher) with perfectly clear recognition of how GPT-4 cannot be trusted,—and a reminder that for all its alleged “understanding” GPT-4 is utterly unconstrained by any internal process of fact-checking, which is to say that it cannot ground its text-pastiching process in reality, another diagnostic of discomprehension:</div><div dir="ltr"><br></div><div dir="ltr"><img src="cid:973395DA-3EAC-4784-9872-295B6F3671BE" alt="IMG_3988"></div><div dir="ltr"><br></div><div dir="ltr">for good measure, some subtle lies about SVB, also generated by GPT-4 (reported by Dileep George).</div><div dir="ltr"><br></div><div dir="ltr"><img src="cid:44F9D5E3-F7D2-4826-8844-21A10444DE4F" alt="IMG_5655"></div><div dir="ltr"><br><blockquote type="cite">On Mar 17, 2023, at 08:48, Gary Marcus <gary.marcus@nyu.edu> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="ltr"></div><div dir="ltr">Average people were fooled by a chatbot called Eugene Goostman that ultimately had exactly zero long-term impact on AI. I wrote about it and the trouble with the Turing Test here in 2014: <a href="https://www.newyorker.com/tech/annals-of-technology/what-comes-after-the-turing-test" style="font-family: Helvetica; font-size: 12px;">https://www.newyorker.com/tech/annals-of-technology/what-comes-after-the-turing-test</a> </div><div dir="ltr"><br><blockquote type="cite">On Mar 17, 2023, at 8:42 AM, Rothganger, Fredrick <frothga@sandia.gov> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr">


<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">

Noting the examples that have come up on this list over the last week, it's interesting that it takes some of the most brilliant AI researchers in the world to devise questions that break LLMs. Chatbots have always been able to fool some people some of the

 time, ever since ELIZA. But we now have systems that can fool a lot of people a lot of the time, and even the occasional expert who loses their perspective and comes to believe the system is sentient. LLMs have either already passed the classic Turning test,

 or are about to in the next generation.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0 ContentPasted1 ContentPasted2">

What does that mean exactly? Turing's expectation was that "the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted". The ongoing discussion here is an

 indication that we are approaching that threshold. For the average person, we've probably already passed it.<br>

</div>

<br>


</div></blockquote></div></blockquote></body></html>