Towards Better Bounds for Finding Quasi-Identifiers

November 25, 2022 ยท Declared Dead ยท ๐Ÿ› ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ryan Hildebrant, Quoc-Tung Le, Duy-Hoang Ta, Hoa T. Vu arXiv ID 2211.13882 Category cs.DS: Data Structures & Algorithms Citations 2 Venue ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Last Checked 3 months ago
Abstract
We revisit the problem of finding small $ฮต$-separation keys introduced by Motwani and Xu (2008). In this problem, the input is $m$-dimensional tuples $x_1,x_2,\ldots,x_n $. The goal is to find a small subset of coordinates that separates at least $(1-ฮต){n \choose 2}$ pairs of tuples. They provided a fast algorithm that runs on $ฮ˜(m/ฮต)$ tuples sampled uniformly at random. We show that the sample size can be improved to $ฮ˜(m/\sqrtฮต)$. Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates $A$, reject if $A$ separates fewer than $(1-ฮต){n \choose 2}$ pairs, and accept if $A$ separates all pairs. The algorithm must be correct with probability at least $1-ฮด$ for all $A$. We show that for algorithms based on sampling: - $ฮ˜(m/\sqrtฮต)$ samples are sufficient and necessary so that $ฮด\leq e^{-m}$ and - $ฮฉ(\sqrt{\frac{\log m}ฮต})$ samples are necessary so that $ฮด$ is a constant. Our analysis is based on a constrained version of the balls-into-bins problem. We believe our analysis may be of independent interest. We also study a related problem that asks for the following sketching algorithm: with given parameters $ฮฑ,k$ and $ฮต$, the algorithm takes a subset of coordinates $A$ of size at most $k$ and returns an estimate of the number of unseparated pairs in $A$ up to a $(1\pmฮต)$ factor if it is at least $ฮฑ{n \choose 2}$. We show that even for constant $ฮฑ$ and success probability, such a sketching algorithm must use $ฮฉ(mk \log ฮต^{-1})$ bits of space; on the other hand, uniform sampling yields a sketch of size $ฮ˜(\frac{mk \log m}{ฮฑฮต^2})$ for this purpose.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms

Died the same way โ€” ๐Ÿ‘ป Ghosted