[CMU AI Seminar] March 19 at 12pm (NSH 3305 & Zoom) -- Sachin Goyal (CMU) -- Think before you speak: Training Language Models With Pause Tokens -- AI Seminar sponsored by SambaNova Systems

Tue Mar 19 11:12:53 EDT 2024

Reminder (NSH 3305) this is happening soon!

On Fri, Mar 15, 2024 at 4:45 PM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Dear all,
>
> We look forward to seeing you *this Tuesday (3/19)* from *1**2:00-1:00 PM
> (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
> sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will
> be held in GHC 6115 *with pizza provided *and will be streamed on Zoom.
>
> To learn more about the seminar series or to see the future schedule,
> please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>
> On this Tuesday (3/19), *Sachin Goyal* (CMU) will be giving a talk titled *"Think
> before you speak: Training Language Models With Pause Tokens**"*.
>
> *Title*: Think before you speak: Training Language Models With Pause
> Tokens
>
> *Talk Abstract*: Transformer-based language models generate responses by
> producing a series of tokens in immediate succession: the (K + 1)th token
> is an outcome of manipulating K hidden vectors per layer, one vector per
> preceding token. What if instead we were to let the model manipulate say, K
> + 10 hidden vectors, before it outputs the (K + 1)th token?
> In this talk, we will discuss how we can teach language models to use
> additional tokens (say pause tokens) to its advantage. Can the language
> model use these extra tokens for processing extra computations before
> committing to an answer. We will specifically explore if this can be done
> just by just finetuning an off-the-shelf language model or if it is
> necessary to pretrain from scratch to elicit such new behaviours.
> Finally, we will discuss a range of conceptual and practical future
> research questions raised by our work, spanning new notions of
> representation capacity beyond the parametric count and making delayed
> next-token prediction a widely applicable paradigm.
>
> *Speaker Bio:* Sachin Goyal is a PhD student in the Machine Learning
> Department at CMU. He works on improving pretraining and robust finetuning
> for foundation models.
>
> *In person: *NSH 3305
> *Zoom Link*:
> https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09
>
> Thanks,
> Asher Trockman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20240319/cd87d2a1/attachment.html>