[CMU AI Seminar] March 19 at 12pm (NSH 3305 & Zoom) -- Sachin Goyal (CMU) -- Think before you speak: Training Language Models With Pause Tokens -- AI Seminar sponsored by SambaNova Systems

Asher Trockman ashert at cs.cmu.edu
Fri Mar 15 16:45:20 EDT 2024


Dear all,

We look forward to seeing you *this Tuesday (3/19)* from *1**2:00-1:00 PM
(U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will be
held in GHC 6115 *with pizza provided *and will be streamed on Zoom.

To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.

On this Tuesday (3/19), *Sachin Goyal* (CMU) will be giving a talk
titled *"Think
before you speak: Training Language Models With Pause Tokens**"*.

*Title*: Think before you speak: Training Language Models With Pause Tokens

*Talk Abstract*: Transformer-based language models generate responses by
producing a series of tokens in immediate succession: the (K + 1)th token
is an outcome of manipulating K hidden vectors per layer, one vector per
preceding token. What if instead we were to let the model manipulate say, K
+ 10 hidden vectors, before it outputs the (K + 1)th token?
In this talk, we will discuss how we can teach language models to use
additional tokens (say pause tokens) to its advantage. Can the language
model use these extra tokens for processing extra computations before
committing to an answer. We will specifically explore if this can be done
just by just finetuning an off-the-shelf language model or if it is
necessary to pretrain from scratch to elicit such new behaviours.
Finally, we will discuss a range of conceptual and practical future
research questions raised by our work, spanning new notions of
representation capacity beyond the parametric count and making delayed
next-token prediction a widely applicable paradigm.

*Speaker Bio:* Sachin Goyal is a PhD student in the Machine Learning
Department at CMU. He works on improving pretraining and robust finetuning
for foundation models.

*In person: *NSH 3305
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09

Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20240315/7cbca5d8/attachment.html>


More information about the ai-seminar-announce mailing list