[CMU AI Seminar] November 7 at 12pm (GHC 6115 & Zoom) -- Asher Trockman (CMU) -- Mimetic Initialization for Transformers and Convolutional Networks -- AI Seminar sponsored by SambaNova Systems

Asher Trockman ashert at cs.cmu.edu
Sat Nov 4 14:25:15 EDT 2023


Dear all,

We look forward to seeing you *this Tuesday (11/7)* from *1**2:00-1:00 PM
(U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will be
held in GHC 6115 *with pizza provided *and will be streamed on Zoom.

To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.

On this Tuesday (11/7), *Asher Trockman* (CMU) will be giving a talk titled
*"**Mimetic Initialization for Transformers and Convolutional Networks**" *to
describe a new class of initialization techniques for deep networks and how
this was inspired by a very simple convolutional network architecture.

*Title*: Mimetic Initialization for Transformers and Convolutional Networks

*Talk Abstract*: While neural network weights are typically initialized
randomly from univariate distributions, pre-trained weights often have
visually-discernible multivariate structure. In recent work, we propose a
technique called "mimetic initialization" that aims to replicate such
structures when initializing convolutional networks and Transformers. We
handcraft a class of multivariate Gaussian distribution to initialize
filters for depthwise convolutional layers, and we initialize the query and
key weights for self-attention layers such that their product approximates
the identity. Mimetic initialization substantially reduces training time
and increases final accuracy on various common benchmarks.

Our technique enables us to almost close the gap between untrained and
pre-trained Vision Transformers on small datasets like CIFAR-10, achieving
up to a 6% gain in accuracy through initialization alone. For convolutional
networks like ConvMixer and ConvNeXt, we observe improvements in accuracy
and reductions in training time, even when convolutional filters are frozen
(untrained) after initialization. Overall, our findings suggest that the
benefits of pre-training can be separated into two components: serving as a
good initialization and storing transferable knowledge, with the former
being simple enough to (at least partially) capture by hand in closed-form.

*Speaker Bio:* Asher Trockman is a PhD student at Carnegie Mellon
University advised by Zico Kolter. He researches deep learning for vision
and deep learning phenomena generally.

*In person: *GHC 6115
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09

Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20231104/e59f19b5/attachment.html>


More information about the ai-seminar-announce mailing list