Theoretical Machine Learning Seminar

Scalable natural gradient training of neural networks

Natural gradient descent holds the potential to speed up training of neural networks by correcting for the problem geometry and achieving desirable invariance properties. I’ll present Kronecker-Factored Approximate Curvature (K-FAC), a scalable natural gradient optimizer for neural nets which exploits the structure of the backprop computations to obtain tractable but effective curvature approximations. By fitting a probabilistic model to the curvature online during training, it aggregates curvature information over the entire dataset, thereby avoiding pitfalls of matrix-vector-product approaches. The K-FAC updates are low overhead (~50% over SGD), and yield 3-5x wall clock speedups on image classification, RNN, and RL benchmarks. K-FAC is invariant to affine reparameterizations of the activations of a neural net. I’ll briefly show how the K-FAC Fisher matrix approximation can be used to approximate posterior uncertainty in Bayesian neural nets.

Date & Time

November 05, 2018 | 12:15pm – 1:45pm

Location

Princeton University, CS 302

Affiliation

University of Toronto