2022 Program for Women and Mathematics: The Mathematics of Machine Learning

Stochastic Gradient Descent: where optimization meets machine learning

Abstract: Stochastic Gradient Descent (SGD) is the de facto optimization algorithm for training neural networks in modern machine learning, thanks to its unique scalability to problem sizes where the data points, the number of data points, and the number of free parameters to optimize are on the scale of billions. On the one hand, many of the mathematical foundations for Stochastic Gradient descent were developed decades before the advent of modern deep learning, from stochastic approximation to the randomized Kaczmarz algorithm for solving linear systems. On the other hand, the omnipresence of stochastic gradient descent in modern machine learning, and the resulting importance of optimizing performance of SGD in practical settings, has motivated new algorithmic designs and mathematical breakthroughs. In this talk, we recall some history and state-of-art convergence theory for SGD which is most useful in modern applications where it is used. We discuss recent breakthroughs in adaptive gradient variants of stochastic gradient descent, which go a long way towards addressing one of the weakest points of SGD: its sensitivity and reliance on hyperparameter tuning.

Date & Time

May 26, 2022 | 11:45am – 12:45pm

Location

Simonyi 101 and Remote Access

Speakers

Rachel Ward, Institute for Advanced Study

Affiliation

University of Texas, Austin

Event Series

WAM

Notes

Video link: https://www.youtube.com/watch?v=NQ0AaVjMcZQ&list=PLCA9C279868C62EB1&ind…

2022 Women and Mathematics