[CMU AI Seminar] December 5 at 12pm (NSH 3305 & Zoom) -- Elan Rosenfeld (CMU) -- Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization -- AI Seminar sponsored by SambaNova System

Asher Trockman ashert at cs.cmu.edu
Sun Dec 3 17:40:09 EST 2023


Dear all,

We look forward to seeing you *this Tuesday (12/5)* from *1**2:00-1:00 PM
(U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will be
held in NSH 3305 *with pizza provided *and will be streamed on Zoom.

To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.

On this Tuesday (12/5), *Elan Rosenfeld* (CMU) will be giving a talk titled
*"**Outliers with Opposing Signals Have an Outsized Effect on Neural
Network Optimization**"*.

*Title*: Outliers with Opposing Signals Have an Outsized Effect on Neural
Network Optimization

*Talk Abstract*: There is a growing list of intriguing properties of neural
network optimization, including specific patterns in their training
dynamics (e.g. simplicity bias, edge of stability, grokking) and the
unexplained effectiveness of various tools (e.g. batch normalization, SAM,
Adam). Extensive study of these properties has so far yielded only a
partial understanding of their origins—and their relation to one another is
even less clear. What is it about gradient descent on neural networks that
gives rise to these phenomena?

In this talk, I will present our recent experiments which offer a new
perspective on many of these findings and suggest that they may have a
shared underlying cause. Our investigation identifies and explores the
significant influence of paired groups of outliers with what we call
Opposing Signals: large magnitude features that dominate the network’s
output throughout most of training and cause large gradients pointing in
opposite directions.

Though our experiments shed some light on these outliers’ influence, we
lack a complete understanding of their precise effect on network training
dynamics. Instead, I’ll share our working hypothesis via a high-level
explanation, and I’ll describe initial experiments which verify some of its
qualitative predictions. We hope a deeper understanding of this phenomenon
will enable future principled improvements to neural network optimization.

*Speaker Bio:* Elan Rosenfeld <https://www.cs.cmu.edu/~elan/> is a final
year PhD student in CMU MLD advised by Profs. Andrej Risteski and Pradeep
Ravikumar. His research focuses on principled approaches to understanding
and improving robustness, representation learning, and generalization in
deep learning.

*In person: *NSH 3305
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09

Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20231203/78156c56/attachment.html>


More information about the ai-seminar-announce mailing list