Computer Science/Discrete Mathematics Seminar II
The Communication Complexity of Distributed Estimation
We propose an extension of Yao's standard two-party communication model, where Alice and Bob respectively hold probability distributions p and q over inputs to a function f, rather than singleton inputs. Their goal is to estimate E[f(x,y)] to within additive error eps where x is drawn from p and y is drawn (independently) from q. We refer to this as the distributed estimation problem for f. We motivate this problem by showing that communication problems that have been studied in sketching, databases and learning are instantiations of distributed estimation for various functions f. Our goal is to understand how the required communication scales with the communication complexity of f and the error parameter eps.
The naive sampling protocol achieves communication that scales as O(1/eps^2). We design a new debiasing protocol for arbitrary bounded functions that requires communication only linear in 1/\eps, and gives better variance reduction than random sampling. We develop a novel spectral technique to show lower bounds for distributed estimation, and use it to show that the Equality function is the easiest full rank Boolean function for distributed estimation. This technique yields tight lower bounds for most functions, with set-disjointness being the notable exception.
Based on joint work with Raghu Meka (UCLA), Prasad Raghavendra (UC Berkeley), Mihir Singhal (UC Berkeley) and Avi Wigderson (IAS).