Privacy as Stability, for Generalization

Many data analysis pipelines are adaptive: the choice of which analysis to run next depends on the outcome of previous analyses. Common examples include variable selection for regression problems and hyper-parameter optimization in large-scale machine learning problems: in both cases, common practice involves repeatedly evaluating a series of models on the same dataset. Unfortunately, this kind of adaptive re-use of data invalidates many traditional methods of avoiding overfitting and false discovery, and has been blamed in part for the recent flood of non-reproducible findings in the empirical sciences. An exciting line of work beginning with Dwork et al. in 2015 establishes the first formal model and first algorithmic results providing a general approach to mitigating the harms of adaptivity, via a connection to the notion of differential privacy.

In this talk, we'll explore the notion of differential privacy and gain some understanding of how and why it provides protection against adaptivity-driven overfitting. Many interesting questions in this space remain open.

Joint work with: Christopher Jung (UPenn), Seth Neel (Harvard), Aaron Roth (UPenn), Saeed Sharifi-Malvajerdi (UPenn), and Moshe Shenfeld (HUJI). This talk will draw on work that appeared at NeurIPS 2019 and ITCS 2020.

Date

April 12, 2021

Speakers

Katrina Legitt

Affiliation

Hebrew University

Computer Science and Discrete Mathematics (CSDM)

School of Mathematics