Mathematics and Computation
Published here are three slightly edited excerpts from "Mathematics and Computation," a new book by Avi Wigderson, Herbert H. Maass Professor in the School of Mathematics, soon to be published by Princeton University Press (online draft available here). These excerpts are chosen to briefly introduce the reader to the Theory of Computation, Computational Complexity Theory, and the new breed of machine learning algorithms discussed in the related article on the Theoretical Machine Learning program at IAS.
The Theory of Computation (ToC) is the study of the formal foundations of computer science and technology. This dynamic and rapidly expanding field straddles mathematics and computer science. Both sides naturally emerged from Turing's seminal 1936 paper, "On computable numbers, with an application to the Entscheidungsproblem." This paper formally defined an algorithm in the form of what we call today the "Turing machine." On the one hand, the Turing machine is a formal, mathematical model of computation, enabling the rigorous definition of computational tasks, the algorithms to solve them, and the resources these require. On the other hand, the definition of the Turing machine allowed its simple, logical design to be readily implemented in hardware and software. These theoretical and practical sides influenced the dual nature of the field. On the mathematical side, the abstract notion of computation revealed itself as an extremely deep and mysterious mathematical notion, and in pursuing it ToC progresses like any other mathematical field. Its researchers prove theorems, and they follow standard mathematical culture to generalize, simplify, and follow their noses based on esthetics and beauty. From the practical side, the universal applicability of automated computation fueled the rapid development of computer technology, which now dominates our life. This rapidly evolving world of computer science and industry continuously creates new models, modes, and properties of computation, which need theoretical understanding and directly impact the mathematical evolution of ToC, while ideas and techniques generated there feed back into the practical world.
The Theory of Computation is a vast field. Many of its subfields are directly connected with applications. These develop formal, mathematical foundations and theories of programming languages, operating systems, hardware, networking, databases, security, robotics, and artificial intelligence, as well as numerous efficient algorithms and algorithmic techniques to solve the basic problems arising in these fields. These models and algorithms contribute directly to the amazing variety of ways in which computers affect our lives. Many other subfields of ToC develop computational models and theories of various natural and manmade processes, interacting with scientific disciplines, including biology, physics, economics, and social sciences. Indeed, once such processes are viewed as information processes, the computational lens shines a powerful new light, which in all these disciplines revives and revises basic questions, beliefs, and theories. These growing interactions lead to better understanding of the natural world.
A central core area of ToC is computational complexity theory, which is often motivated by but is typically furthest removed from the applications and is the most intimately connected with mathematics. Complexity theory's initial charge was the understanding of efficient computation in the most general sense: what are the minimal amounts of natural resources, like time, memory, and others, needed to solve natural computational tasks by natural computational models. In response, it developed a powerful toolkit of algorithmic techniques and ways to analyze them, as well as a classification system of computational problems in complexity classes. It has also formulated natural long-term goals for the field.
With time, and with internal and external motivations, computational complexity theory has greatly expanded its goals. It took on the computational modeling and understanding of a variety of central notions, some studied for centuries by great minds, including secret, proof, learning, knowledge, randomness, interaction, evolution, game, strategy, coordination, synchrony, and others. This computational lens often resulted in completely new meanings of these old concepts. Moreover, in some of these cases the resulting theories predated, and indeed enabled, significant technological advances. Also, some of these theories form the basis of interactions with other sciences.
Thus, starting with the goal of understanding what can be efficiently computed, a host of natural long-term goals of deep conceptual meaning emerged. What can be efficiently learned? What can be efficiently proved? Is verifying a proof much easier than finding one? What does a machine know? What is the power of randomness in algorithms? Can we tap into natural sources of randomness and use this power? What is the power of quantum mechanical computers? Can we utilize quantum phenomena in algorithms? In settings where different computational entities have different (possibly conflicting) incentives, what can be achieved jointly? Privately? Can computers efficiently simulate nature, or the brain?
So, it is pretty certain that algorithms will rule the Earth, but which algorithms? For decades, computer scientists have been designing and analyzing algorithms, which underlie the computer revolution that affects every aspect of our lives. These are gems of human invention. But recently, a combination of remarkable computing power, important optimization techniques, and massive data, has enabled a new breed of algorithms, which arise in machine learning. In short, algorithms as technological products. Tech and medical companies, as well as governments, invest enormous funds and effort in these new algorithms.
In stark contrast to the elegant, concise algorithmic gems, which were man-made, many new algorithms simply "create themselves," with relatively little intervention from humans, mainly through interaction with massive data. This contrast with "classic" algorithm design is heightened as these new algorithms are typically (and perhaps necessarily) gigantic, and very poorly understood by humans. I am referring of course to the "deep networks," a common name to heuristics modeled after networks of neurons.
These self-taught algorithms attempt to solve numerous problems that are often hard to formally define, such as finding "significant signals" in huge data sets that are often extremely noisy—be they financial, astrophysical, biological, or the Internet. The structure one may want to extract/uncover may be clusters, correlations, geometric or numerical patterns, etc., or completely unexpected ones. These may represent, e.g., familiar faces or objects in pictures, groups of friends and interests in social media, market performance of companies in the buying and selling of stocks, effects of treatments or genetic defects in biological data, and illuminating interactions of matter and energy in physical observations.
This is no place to give a proper description of the deep nets and their training process. It suffices to say that today their size can get to millions of gates and wires between gates. Each connection has a strength parameter that the training process attempts to optimize. In short, training is a huge optimization problem with an ill-specified goal and numerous degrees of freedom in the heuristics driving the attempted optimization. This "black magic" works extremely well only in relatively few cases so far. But in a growing number of cases it greatly outperforms any human-designed algorithm. It is extremely interesting when such a self-taught program can label by name the essential content of arbitrary pictures taken by humans, nearly as well as humans would. It is very impressive that another such program, after playing against itself a billion games of Go, can now beat the world's best human players. Great progress by such programs is made on human language understanding and translation, and perhaps their fastest growth is felt in "Data Science," where these programs play ever more important roles in actual scientific discovery. Indeed, one wonders at what point they will be able to better articulate the new scientific laws uncovered and pursue their consequences like medical drugs and treatments, food and energy supply, etc., also without human help.
The main point I find fascinating and challenging on this front, and for which the theory of computation can and should contribute, is a theoretical understanding of such algorithms. This understanding should include ways to make their training more principled and efficient, and developing models and tools to assess their performance and output quality. A major challenge is to understand why and for what tasks are deep networks successful, and what are their limitations. Further, it is of crucial importance given their growing prevalence in systems humans crucially depend on, to explore how to make them fair (and what this means), how susceptible they are to adversarial data and how to protect against it. Of course, the size and complexity of deep nets may put limits to how well they can be theoretically understood (much like massive creations of nature as the brain and other biological systems). Indeed, it is not unlikely that the theoretical study of deep nets (on which, unlike animals, experimentation is free from the Declaration of Helsinki guidelines) will help in better understanding biological systems and vice versa.
Given our current ignorance (namely, gaps between upper and lower bounds) regarding many well-posed important problems, I have no doubt that there is plenty more room for the discovery of algorithmic gems which we can easily understand, appreciate, and use. To drive this point home, let me note in conclusion that no deep-nets would exist without a few great algorithmic gems embedded in practically all of them, including the extremely efficient back propagation and gradient descent.