Approximating large frequency moments with pick-and-drop sampling

Given data stream \(D = \{p_1,p_2,...,p_m\}\) of size \(m\) of numbers from \(\{1,..., n\}\), the frequency of \(i\) is defined as \(f_i = |\{j: p_j = i\}|\). The \(k\)-th frequency moment of \(D\) is defined as \(F_k = \sum_{i=1}^n f_i^k\). We consider the problem of approximating frequency moments for \(k\ge 3\). For any constant \(c\) we show an \(O(n^{1-2/k}\log(n)\log^{(c)}(n))\) upper bound on the space complexity of the problem. Here \(\log^{(c)}(n)\) is the iterative \(\log\) function. We make the following assumptions: \(n\) and \(m\) are polynomially far; approximation error \(\epsilon\) and parameter \(k\) are constants. We observe a natural bijection between streams and special matrices. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a *pick-and-drop sampling*; it samples a heavy element (i.e., element \(i\) with frequency \(\Omega(F_k)\)) with probability \(\Omega(1/n^{1-2/k})\) and gives approximation \(\tilde{f_i} \ge (1-\epsilon)f_i\). In addition, the estimations never exceed the real values, that is \( \tilde{f_j} \le f_j\) for all \(j\). As a result, we reduce the space complexity of finding a heavy element to \(O(n^{1-2/k}\log(n))\) bits. We apply our method of recursive sketches and resolve the problem with \(O(n^{1-2/k}\log(n)\log^{(c)}(n))\) bits. This is joint work with Rafail Ostrovsky.

Date

Speakers

Vladimir Braverman

Affiliation

Johns Hopkins University