Solving the Mysteries of Deep Learning

Published 2020

Sanjeev Arora is Distinguished Visiting Professor in the School of Mathematics at the Institute for Advanced Study. Specializing in the theory of deep learning, with an interest in natural language processing and privacy, Arora directed the Institute’s special program in “Optimization, Statistics, and Theoretical Machine Learning” in academic year 2019–20. He also co-organized several workshops, including a workshop on the “Social and Ethical Challenges of Machine Learning” with the School of Social Sciences. Here, he speaks with IAS Distinguished Journalism Fellow Joanne Lipman about the promise of deep learning and navigating ethical issues of bias and privacy. This conversation was conducted on April 28, 2020. It has been edited for length and clarity.

Joanne Lipman: Let’s start with, what is deep learning?

Sanjeev Arora: Deep learning is a form of machine learning that was loosely inspired by a simplistic 1940s model of how the brain works. This model is called neural nets, where you have a bunch of simplistic units, very simple units, which interconnect with wires, kind of like a network. Each unit computes by taking inputs from other units and summing them up and doing some simple computation and passing it on to other units. The answer comes out at the end from a designated output unit.

This model has been around since the 1940s, and suddenly it became very, very popular, influential, and successful about eight years ago. And it’s a big mystery why these models work so well. Much current work treats neural nets as a black box, and we’re trying to open the black box and understand their mathematical properties.

Deep Learning vs. Machine Learning

Joanne Lipman: What is the difference between deep learning and machine learning?

Sanjeev Arora: Machine learning involves a mathematical model that takes data and learns from it and learns to produce algorithms. Deep learning is a type of machine learning.

“Deep” refers to the fact that this model has many layers. So the information goes from layer to layer to layer, and in modern models there could be hundreds of layers. For the longest time, people thought training these would be impossible mathematically.

There were some innovations in the last eight to nine years that let it work.

JL: If they are so simple, then what is the mystery?

SA: The simplest analogy I can give you from real life is, think about the economy. The world has seven billion people, and all of us, in terms of economics, we are not that complicated, right? We have some preferences and demands (what we like) and we have some money. Then we buy. So ultimately, it’s not very complicated math, which describes an individual’s economic behavior.

But then you put together seven billion of these in the economy, and it’s very difficult to know what that global economy is doing, and how it will behave a year from now. That’s even leaving aside the natural uncertainties, like suddenly the coronavirus gets dropped on us. The mystery of neural nets is analogous to that: very simple units are communicating with each other, but in modern models there are hundreds of millions or even billions of units, and so the aggregate behavior is not clear to mathematics.

JL: Can you give our readers examples of how deep learning is used?

SA: A lot of the recent progress in recognizing objects and images is driven by this. So when you upload a picture on Facebook or social media and they know who the ten people are in that picture and they draw circles around it, that’s done by deep learning. Translation from one language to another has in the last five years or so really shot up in accuracy, and that’s driven by deep learning. The ability of computers to play games at superhuman levels, especially Go, is driven by deep learning. Self-driving cars use deep learning.

JL: Is there a place for machine learning in developing Covid therapies?

SA: Machine learning is ubiquitous in all life sciences—biology, neuroscience. I am not an expert in any of this, but I imagine at every level you’re probably using machine learning. First, they’re doing the imaging of the coronavirus, right? The goal is to come up with a more detailed physical description—the various nubs and spines on it—and how to design drugs that attach to those and so on.

Machine learning is probably used in many of these investigations. Finding a drug that works against a virus also requires searching through lots of possibilities. It’s not humanly possible with a handful of experts to go through a million things in a few days, and machine learning plays an important role.

JL: Putting aside medical applications, will your research—trying to open up the “black box” of how deep learning works—change the way the technology is used?

SA: One offshoot that has arisen is new ideas for ensuring privacy. You have all these devices picking up your data from your normal activities, and then it’s being fed into the tech companies’ machine-learning algorithms, which are deep-learning algorithms.

Obviously, tech companies improve our lives by training these algorithms, Siri and Alexa and so on. But at the same time, I wouldn’t want that process to suck up my information and have it sitting on the cloud somewhere. So can you train this deep-learning model on my data, without knowing my data?

And we think that’s possible, and so that’s a very exciting new discovery. It came out of these efforts to look inside the black box and some phenomena that were known, and we sort of put it together for this system, for this method.

JL: Is this a way of anonymizing data? Like when tech firms say, “We’re looking at the aggregate. We’re not looking at you individually.”

SA: I believe you are referring to something called differential privacy, which is a method of adding some noise to the individual’s data, which allows it to be used in machine learning while preserving some of their privacy. But that doesn’t protect the privacy completely because the firms still got the data, right? It’s on their servers. There’s no way for them to train the model without having your data.

JL: So your insight is, they may not need your actual private data?

SA: I realize this sounds impossible, but yes.

JL: That’s extraordinary, since the heart of the technological ethical question is privacy.

SA: It’s a new technique, and it seems to work in some settings, and the full implications are still being worked out. That’s why I’m not claiming too much, but, yes, that’s something that came out of efforts to understand the black box.

JL: One of the privacy concerns emerging now involves contact tracing, which is how we’re supposed to end the Covid contagion. Some countries have used technology that is very invasive. Do you think we should use it too?

SA: Again, I am no expert, but I personally feel we should avoid any kneejerk things in that sphere. I mean, there’s already some anxiety about where democracy’s going in many countries. We better not set up a system we may end up regretting.

Videos from “Theory of Deep Learning: Where Next?”

The Institute's School of Mathematics held the workshop "Theory of Deep Learning: Where Next?" October 15–18, 2019, as part of the School's special year on Optimization, Statistics, and Theoretical Machine Learning.

The workshop brought together deep learning practitioners and theorists to discuss the progress that has been made on deep learning theory and to identify promising avenues where theory is possible and useful. It was organized by Sanjeev Arora, Distinguished Visiting Professor in the School, Members Joan Bruna, Rong Ge, Jason Lee, and Bin Yu, and Suriya Gunasekar.

Watch videos from the workshop in the window above, or on the IAS YouTube channel.

JL: If you had to predict five years from now, will we have solved the privacy issues?

SA: That’s a difficult question. It’s not clear that’s just a technical question. It’s also a legal and societal question. We don’t even know where our government is going in five years, so it’s very hard to say that. But I do think that just in general, mathematically and technically, we should be able to come up with much better methods.

As I indicated, I’m working on some of them to preserve your privacy when the data is used for machine learning. So it may be technically feasible to allow corporations to innovate while keeping individuals’ data private. But that doesn’t change the fact that there’s a huge economic incentive for corporations and all kinds of actors to collect my data and keep it on their servers, and sell it for pennies on the market. Those incentives will still exist. It’s a societal problem, and it's unclear how to control that or redirect it in some way.

So on the technical side I’m hopeful that there are solutions, and they’ll continue to improve. But because the companies that control that technology also themselves have a strong incentive to hold onto your data, and use it, the situation is not clear at all.

JL: This ties into the ethics of machine learning, an area you have been involved in. Can you explain what’s behind that conversation?

SA: The ethical issues arise from some of the technical issues that I was mentioning, that you train a network to, let’s say, do any task like detect spam comments on a newspaper website or approve or deny loans, et cetera. Since its working is a black box to us, how can we make sure that it’s not discriminating against disadvantaged groups, because it’s learning from data and past decisions, and maybe those were biased? How can we program fairness into it? How can we make sure its decisions can’t be swayed by people who know the decision-maker algorithm?

JL: We’ve all heard the example of facial recognition that doesn’t recognize black faces as accurately as white faces, and the Amazon example of machine learning that screened out women’s resumes because of previous hiring patterns. How do we overcome those issues of bias?

SA: That’s going be tough. In machine learning, the paradigm is that you train on data, you don’t question the data. So if the data is garbage, you are going to learn garbage. The process of preparing data to input into the machine-learning algorithm is not part of machine learning. Maybe it should be.

Why Deep Learning is a “Black Box”

Joanne Lipman: What do you mean that deep learning is a “black box”?

Sanjeev Arora: We have a description of the numbers stored in the deep network, the value of wires, and so on. So we have access to the network, but we don’t have any mathematical understanding of what it’s doing.

You can see how it’s working in the sense that you can apply it on data and see what outputs come out. If you try it on 100,000 examples, and it looks sort of reasonable, you say, “Fine, ready for deployment.” But we don’t really understand the innards.

Why did it make a decision? It outputs some decision. Why did it do that? I think the economics analogy is appropriate. We think we understand how individuals operate, roughly, but then when you put seven billion of them together, how does that whole system train and operate? What are its laws? There should be some description, though it may not be a simple one.

That’s always the hope, that we have a simple description, but there might not be. That’s what we are trying to understand.

JL: Do you think that machine learning would ever be able to recognize bias?

SA: For starters, you have to mathematically define what that problem is. What is bias? If bias is a mathematical property, then you can try to train a machine-learning algorithm to detect it, yes, and people are trying to do that.

JL: What are other social and ethical implications of machine learning?

SA: A handful of experts are worrying about the doomsday scenario. As machines do more and more functions in the economy and then in our lives, how do we prevent the kind of doomsday or apocalyptic scenarios that are talked about in science fiction? That’s sort of the more extreme worry, maybe farther into the future.

In my opinion it is not something to worry about in the near future.

JL: An apocalyptic scenario would be what?

SA: Well, take any science-fiction movie, right? That’s what I’m talking about. It’s not clear how current technology gets there, but some people are already thinking about that.

JL: You mean Hal in 2001, who takes over the spaceship kind of scenario?

SA: Or even more. I mean, that was 1968. There’ve been much more dire movies. Terminator, Blade Runner, Minority Report.

JL: Terminator, that’s a good one.

SA: We’re showing our age, right, by thinking of movies from the ’80s!

JL: (Laughs.) Right! How do you feel about the machine learning in our own lives, like Alexa and Siri? Do you use those technologies?

SA: I don’t too much.

JL: Why not?

SA: The quickest answer is that just generationally, I’m not used to using those technologies. But also, being somewhat knowledgeable, I find the tech world very invasive of privacy. Maybe in 20 years people will think nothing of it, but to me it feels invasive. I turn off a lot of invasive things in my browser for example. I worry about the privacy implications of all this.

JL: I find this is true from a lot of people who are well-versed in technology. It seems like the more you know, the more on guard you are about privacy invasions in particular.

SA: It reminds me of when I was an undergrad at MIT, and one day I was standing waiting for an elevator next to a professor of mine. He was older than me, obviously, a fair bit. The elevator was about to leave, and I stuck my hand in, to open the door again.

He looked at me, and he said, “I’m an engineer. I would never do that.”

JL: When you know too much, right?

SA: Yes, and you know that all these systems can fail.

Deep Learning

Machine Learning

Q&A

Sanjeev Arora

School of Mathematics

Teatime at Home

Solving the Mysteries of Deep Learning

Deep Learning vs. Machine Learning

Videos from “Theory of Deep Learning: Where Next?”

Why Deep Learning is a “Black Box”

Establishing a Theoretical Understanding of Machine Learning

Watch: Deep Learning, Alchemy or Science?

Videos: Theoretical Machine Learning