Interpretability for Everyone

In this talk, I would like to share some of my reflections on the progress made in the field of interpretable machine learning. We will reflect on where we are going as a field, and what are the things that we need to be aware of to make progress. With that perspective, I will then discuss some of my work on 1) sanity checking popular methods and 2) developing more lay person-friendly interpretability methods. I will also share some open theoretical questions that may help us move forward. I hope this talk will offer a new angle to look at ways to make progress in this field. A bit more on 2): most of the recent interpretability methods are designed by machine learning experts, for machine learning experts. l will introduce a different family of methods that are designed to help lay people - people who may have non-ML domain expertise (e.g., medical). Testing with concept activation vectors (TCAV) is a post-training interpretability method for complex models, such as neural networks. This method provides an interpretation of a neural net's internal state in terms of human-friendly, high-level concepts instead of low-level input features. In other words, this method can “speak” the user’s language, rather than the computer’s language. I will also discuss more recent extensions of this work towards automatically discovering high-level concepts as well as discovering ‘complete’ and causal concepts.



Been Kim