<div dir="ltr">Dear all,<div><br></div><div><div>We look forward to seeing you <b>this Tuesday (11/7)</b> from <b><font color="#ff0000">1</font></b><font color="#ff0000"><b>2:00-1:00 PM (U.S. Eastern time)</b></font> for the next talk of this semester's <b>CMU AI Seminar</b>, sponsored by <a href="https://sambanova.ai/" target="_blank">SambaNova Systems</a>. The seminar will be held in GHC 6115 <b>with pizza provided </b>and will<b> </b>be streamed on Zoom.</div><div><br></div><div>To learn more about the seminar series or to see the future schedule, please visit the <a href="http://www.cs.cmu.edu/~aiseminar/" target="_blank">seminar website</a>.</div><div><br></div><font color="#0b5394"><span style="background-color:rgb(255,255,0)">On this Tuesday (11/7), <u>Asher Trockman</u> </span><span style="background-color:rgb(255,255,0)">(CMU) will be giving a talk titled </span><b style="background-color:rgb(255,255,0)">"</b><span style="background-color:rgb(255,255,0)"><b>Mimetic Initialization for Transformers and Convolutional Networks</b></span></font><b style="color:rgb(11,83,148);background-color:rgb(255,255,0)">" </b><span style="color:rgb(11,83,148);background-color:rgb(255,255,0)">to describe a new class of initialization techniques for deep networks and how this was inspired by a very simple convolutional network architecture</span><font color="#0b5394" style="background-color:rgb(255,255,0)">.</font></div><div><font color="#0b5394"><span style="background-color:rgb(255,255,0)"><br></span><b>Title</b>: Mimetic Initialization for Transformers and Convolutional Networks<br><br></font><div><font color="#0b5394"><b>Talk Abstract</b>: While neural network weights are typically initialized randomly from univariate distributions, pre-trained weights often have visually-discernible multivariate structure. In recent work, we propose a technique called "mimetic initialization" that aims to replicate such structures when initializing convolutional networks and Transformers. We handcraft a class of multivariate Gaussian distribution to initialize filters for depthwise convolutional layers, and we initialize the query and key weights for self-attention layers such that their product approximates the identity. Mimetic initialization substantially reduces training time and increases final accuracy on various common benchmarks.</font></div><div><font color="#0b5394"><br></font></div><font color="#0b5394">Our technique enables us to almost close the gap between untrained and pre-trained Vision Transformers on small datasets like CIFAR-10, achieving up to a 6% gain in accuracy through initialization alone. For convolutional networks like ConvMixer and ConvNeXt, we observe improvements in accuracy and reductions in training time, even when convolutional filters are frozen (untrained) after initialization. Overall, our findings suggest that the benefits of pre-training can be separated into two components: serving as a good initialization and storing transferable knowledge, with the former being simple enough to (at least partially) capture by hand in closed-form.</font></div><div><font color="#0b5394"> </font><div><div><div><font color="#0b5394"><b>Speaker Bio:</b> Asher Trockman is a PhD student at Carnegie Mellon University advised by Zico Kolter. He researches deep learning for vision and deep learning phenomena generally.</font></div><div><font color="#0b5394"><br></font></div><div><font color="#0b5394"><b>In person: </b>GHC 6115</font></div><div><font color="#0b5394"><b>Zoom Link</b>:  <a href="https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09" target="_blank">https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09</a></font></div></div></div></div><div><br></div><div>Thanks,</div><div>Asher Trockman</div></div>