Theoretical Machine Learning Seminar

What Noisy Convex Quadratics Tell Us about Neural Net Training

I’ll discuss the Noisy Quadratic Model, the toy problem of minimizing a convex quadratic function with noisy gradient observations. While the NQM is simple enough to have closed-form dynamics for a variety of optimizers, it gives a surprising amount of insight into neural net training phenomena. First, we’ll look at the problem of adapting learning rates using meta-descent (i.e. differentiating through the training dynamics). The NQM illuminates why short-horizon meta-descent objectives vastly underestimate the optimal learning rate. Second, we’ll study how the behavior of various neural net optimizers depends on the batch size. When is it beneficial to use preconditioning? Momentum? Parameter averaging? Learning rate schedules? The NQM can generate predictions in seconds which seem to capture the qualitative behavior of large-scale classification conv nets and transformers.

Date & Time

January 28, 2020 | 12:00pm – 1:30pm

Location

Dilworth Room

Affiliation

University of Toronto; Member, School of Mathematics