[CMU AI Seminar] October 3 at 12pm (GHC 6115 & Zoom) -- Nikhil Ghosh (UC Berkeley) -- Hyperparameter Transfer for Finetuning Large-Scale Models -- AI Seminar sponsored by SambaNova Systems
Asher Trockman
ashert at cs.cmu.edu
Sat Sep 30 17:13:23 EDT 2023
Dear all,
We look forward to seeing you *this Tuesday (10/3)* from *1**2:00-1:00 PM
(U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will be
held in GHC 6115 *with pizza provided *and will be streamed on Zoom.
To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
On this Tuesday (10/3), *Nikhil Ghosh* (UC Berkeley) will be giving a talk
titled *"**Hyperparameter Transfer for Finetuning Large-Scale Models**".*
*Title*: Hyperparameter Transfer for Finetuning Large-Scale Models
*Talk Abstract*: Current models have become so large that most
practitioners are unable to effectively tune hyperparameters due to limited
computational resources, which results in suboptimal performance. In this
talk I will be discussing ongoing work which aims to address this issue by
transferring the optimal learning rate from smaller models. This work
builds on previous ideas of Yang et al. (2022), which achieves
hyperparameter transfer for pretraining large models. In the current work,
we aim to study the same problem but in the finetuning setting. By reducing
the width of a pretrained model via random subsampling and rescaling
according to the muP parameterization of Yang et al, we obtain a smaller
proxy model which we can finetune with significantly less resources. In
certain settings, such as when finetuning using LoRA on large datasets, the
optimal learning rate is preserved under subsampling, which allows for
immediate transfer to larger models. In general, however, we find through
both experiments and theoretical calculations that the optimal learning
rate can display a rich variety of scaling behaviors. Characterizing the
scaling behavior requires understanding more fine-grained aspects of
training and generalization.
*Speaker Bio:* Nikhil Ghosh is a PhD student in the Statistics department
at UC Berkeley working with Bin Yu and Song Mei. His main interests are in
the theory of deep learning. Previously he studied computer science at
Caltech and has completed internships at Google and Microsoft Research.
*In person: *GHC 6115
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09
Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20230930/8603f735/attachment.html>
More information about the ai-seminar-announce
mailing list