Theoretical Machine Learning Seminar

Learning probability distributions; What can, What can't be done

A possible high level description of statistical learning is that it aims to learn about some unknown probability distribution ("environment”) from samples it generates ("training data”). In its most general form, assuming no prior knowledge and asking to find accurate approximations to the data generating distributions, there can be no success guarantee. In this talk I will discuss two major directions of relaxing that too hard problem. First, I will address the situation under common prior knowledge assumption - I will describe settling the question of the sample complexity of learning mixtures of Gaussians. Secondly, I will address what can be learnt about unknown distributions when no prior knowledge is applied. I will describe a surprising result. Namely, the independence from set theory of a basic statistical learnability problem. As a corollary, I will show that there can be no combinatorial dimension that characterizes the families of random variables that can be reliably learnt (in contrast with the known VC-dimension like characterizations of common supervised learning tasks). Both parts of the talks use novel notions of sample compression schemes as key components. The first part is based on joint work with Hasan Ashiani, Nick Harvey, Chris Law, Abas Merhabian and Yaniv Plan and the second part on work with Shay Moran, Pavel Hrubes, Amir Shpilka and Amir Yehudayoff.

Date & Time

May 07, 2020 | 3:00pm – 4:30pm

Location

Remote Access Only - see link below

Speakers

Shai Ben-David

Affiliation

University of Waterloo

Notes

Please note: interested participants will need to fill out this google form in advance to obtain access to this seminar. Once approved, you will receive the login details.  Members of the IAS community do not need to fill out this form - the login details will be emailed to you.

https://forms.gle/vxPMgdiURWpRqrV8A