[CMU AI Seminar] Nov 2 at 12pm (Zoom) -- Jeremy Cohen (CMU MLD) -- Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability -- AI Seminar sponsored by Morgan Stanley

Tue Nov 2 12:02:03 EDT 2021

Hi all,

*Jeremy Cohen*'s talk on surprising observations and dynamics of full-batch
GD on deep neural nets is starting in a few minutes!

Zoom link:
https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09

Best,
Shaojie

On Mon, Nov 1, 2021 at 3:32 PM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Hi all,
>
> Just a reminder that the CMU AI Seminar
> <http://www.cs.cmu.edu/~aiseminar/> is tomorrow *12pm-1pm*:
> https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09.
>
> *Jeremy Cohen (CMU MLD)* will be giving a talk on the surprising dynamics
> of full-batch gradient descent on neural networks.
>
> Thanks,
> Asher
>
>
> On Wed, Oct 27, 2021 at 10:56 AM Asher Trockman <ashert at cs.cmu.edu> wrote:
>
>> Dear all,
>>
>> We look forward to seeing you *next Tuesday (11/2)* from *1**2:00-1:00
>> PM (U.S. Eastern time)* for the next talk of our *CMU AI Seminar*,
>> sponsored by Morgan Stanley
>> <https://www.morganstanley.com/about-us/technology/>.
>>
>> To learn more about the seminar series or see the future schedule, please
>> visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>>
>> On 11/2, *Jeremy Cohen* (CMU MLD) will be giving a talk on "*Gradient
>> Descent on Neural Networks Typically Occurs at the Edge of Stability*".
>>
>> *Title:* Gradient Descent on Neural Networks Typically Occurs at the
>> Edge of Stability
>>
>> *Talk Abstract:* Neural networks are trained using optimization
>> algorithms. While we sometimes understand how these algorithms behave in
>> restricted settings (e.g. on quadratic or convex functions), very little is
>> known about the dynamics of these optimization algorithms on real neural
>> objective functions. In this paper, we take a close look at the simplest
>> optimization algorithm—full-batch gradient descent with a fixed step
>> size—and find that its behavior on neural networks is both (1) surprisingly
>> consistent across different architectures and tasks, and (2) surprisingly
>> different from that envisioned in the "conventional wisdom."
>>
>> In particular, we empirically demonstrate that during gradient descent
>> training of neural networks, the maximum Hessian eigenvalue (the
>> "sharpness") always rises all the way to the largest stable value, which is
>> 2/(step size), and then hovers just *above* that numerical value for the
>> remainder of training, in a regime we term the "Edge of Stability." (Click
>> here <https://twitter.com/deepcohen/status/1366881479175847942> for 1m
>> 17s animation.) At the Edge of Stability, the sharpness is still "trying"
>> to increase further—and that's what happens if you drop the step size—but
>> is somehow being actively restrained from doing so, by the implicit
>> dynamics of the optimization algorithm. Our findings have several
>> implications for the theory of neural network optimization. First, whereas
>> the conventional wisdom in optimization says that the sharpness ought to
>> determine the step size, our paper shows that in the topsy-turvy world of
>> deep learning, the reality is precisely the opposite: the *step size*
>> wholly determines the *sharpness*. Second, our findings imply that
>> convergence analyses based on L-smoothness, or on ensuring monotone
>> descent, do not apply to neural network training.
>>
>> *Speaker Bio: *Jeremy Cohen is a PhD student in the Machine Learning
>> Department at CMU, co-advised by Zico Kolter and Ameet Talwalkar. His
>> research focus is "neural network plumbing": how to initialize and
>> normalize neural networks so that they train quickly and generalize well.
>>
>> *Zoom Link:*
>> https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09
>>
>> Thanks,
>> Asher Trockman
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20211102/8544878c/attachment-0001.html>