Sublime: Sublinear Error & Space for Unbounded Skewed Streams

March 15, 2026 ยท Grace Period ยท ๐Ÿ› SIGMOD 2026

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Navid Eslami, Ioana O. Bercea, Rasmus Pagh, Niv Dayan arXiv ID 2603.14190 Category cs.DS: Data Structures & Algorithms Cross-listed cs.DB, cs.IT Citations 0 Venue SIGMOD 2026
Abstract
Modern stream processing systems must often track the frequency of distinct keys in a data stream in real-time. Since monitoring the exact counts often entails a prohibitive memory footprint, many applications rely on compact, probabilistic data structures called frequency estimation sketches to approximate them. However, mainstream frequency estimation sketches fall short in two critical aspects: (1) They are memory-inefficient under data skew. This is because they use uniformly-sized counters to track the key counts and thus waste memory on storing the leading zeros of many small counter values. (2) Their estimation error deteriorates at least linearly with the stream's length, which may grow indefinitely over time. This is because they count the keys using a fixed number~of~counters. We present Sublime, a framework that generalizes frequency estimation sketches to address these problems by dynamically adapting to the stream's skew and length. To save memory under skew, Sublime uses short counters upfront and elongates them with extensions stored within the same cache line as they overflow. It leverages novel bit manipulation routines to quickly access a counter's extension. It also controls the scaling of its error rate by expanding its number of approximate counters as the stream grows. We apply Sublime to Count-Min Sketch and Count Sketch. We show, theoretically and empirically, that Sublime significantly improves accuracy and memory over the state of the art while maintaining competitive or superior performance.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms