ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force
August 18, 2023 Β· Declared Dead Β· π Workshop on Algorithm Engineering and Experimentation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hans-Peter Lehmann, Peter Sanders, Stefan Walzer
arXiv ID
2308.09561
Category
cs.DS: Data Structures & Algorithms
Citations
12
Venue
Workshop on Algorithm Engineering and Experimentation
Last Checked
4 months ago
Abstract
A minimal perfect hash function (MPHF) maps a set $S$ of $n$ keys to the first $n$ integers without collisions. There is a lower bound of $n\log_2e-O(\log n)$ bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation, $e^n\textrm{poly}(n)$ seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions $h_0$ and $h_1$, hoping for the existence of a function $f : S \rightarrow \{0,1\}$ such that $x \mapsto h_{f(x)}(x)$ is an MPHF on $S$. In graph terminology, ShockHash generates $n$-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store $f$ using $n + o(n)$ bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only $(e/2)^n\textrm{poly}(n)$ hash function seeds in expectation, reducing the space for storing the seed by roughly $n$ bits. This makes ShockHash almost a factor $2^n$ faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted