Interpretability for Everyone
A bit more on 2): most of the recent interpretability methods are designed by machine learning experts, for machine learning experts. l will introduce a different family of methods that are designed to help lay people - people who may have non-ML domain expertise (e.g., medical). Testing with concept activation vectors (TCAV) is a post-training interpretability method for complex models, such as neural networks. This method provides an interpretation of a neural net's internal state in terms of human-friendly, high-level concepts instead of low-level input features. In other words, this method can “speak” the user’s language, rather than the computer’s language. I will also discuss more recent extensions of this work towards automatically discovering high-level concepts as well as discovering ‘complete’ and causal concepts.