Beating CountSketch for Heavy Hitters in Insertion Streams
November 02, 2015 Β· Declared Dead Β· π Symposium on the Theory of Computing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff
arXiv ID
1511.00661
Category
cs.DS: Data Structures & Algorithms
Citations
48
Venue
Symposium on the Theory of Computing
Last Checked
3 months ago
Abstract
Given a stream $p_1, \ldots, p_m$ of items from a universe $\mathcal{U}$, which, without loss of generality we identify with the set of integers $\{1, 2, \ldots, n\}$, we consider the problem of returning all $\ell_2$-heavy hitters, i.e., those items $j$ for which $f_j \geq Ξ΅\sqrt{F_2}$, where $f_j$ is the number of occurrences of item $j$ in the stream, and $F_2 = \sum_{i \in [n]} f_i^2$. Such a guarantee is considerably stronger than the $\ell_1$-guarantee, which finds those $j$ for which $f_j \geq Ξ΅m$. In 2002, Charikar, Chen, and Farach-Colton suggested the {\sf CountSketch} data structure, which finds all such $j$ using $Ξ(\log^2 n)$ bits of space (for constant $Ξ΅> 0$). The only known lower bound is $Ξ©(\log n)$ bits of space, which comes from the need to specify the identities of the items found. In this paper we show it is possible to achieve $O(\log n \log \log n)$ bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including (1) The first algorithm for estimating $F_2$ simultaneously at all points in a stream using only $O(\log n\log\log n)$ bits of space, improving a natural union bound and the algorithm of Huang, Tai, and Yi (2014). (2) A way to estimate the $\ell_{\infty}$ norm of a stream up to additive error $Ξ΅\sqrt{F_2}$ with $O(\log n\log\log n)$ bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted