Distinct Sampling on Streaming Data with Near-Duplicates

October 29, 2018 ยท Declared Dead ยท ๐Ÿ› ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jiecao Chen, Qin Zhang arXiv ID 1810.12388 Category cs.DS: Data Structures & Algorithms Citations 7 Venue ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Last Checked 3 months ago
Abstract
In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that all the near-duplicates are treated as the same element. We also extend the result to the sliding window cases in which we are only interested in the most recent items. We present algorithms with provable theoretical guarantees for datasets in the Euclidean space, and also verify their effectiveness via an extensive set of experiments.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms

Died the same way โ€” ๐Ÿ‘ป Ghosted