Practical and Effective Re-Pair Compression
April 27, 2017 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Philip Bille, Inge Li GΓΈrtz, Nicola Prezza
arXiv ID
1704.08558
Category
cs.DS: Data Structures & Algorithms
Citations
10
Venue
arXiv.org
Last Checked
4 months ago
Abstract
Re-Pair is an efficient grammar compressor that operates by recursively replacing high-frequency character pairs with new grammar symbols. The most space-efficient linear-time algorithm computing Re-Pair uses $(1+Ξ΅)n+\sqrt n$ words on top of the re-writable text (of length $n$ and stored in $n$ words), for any constant $Ξ΅>0$; in practice however, this solution uses complex sub-procedures preventing it from being practical. In this paper, we present an implementation of the above-mentioned result making use of more practical solutions; our tool further improves the working space to $(1.5+Ξ΅)n$ words (text included), for some small constant $Ξ΅$. As a second contribution, we focus on compact representations of the output grammar. The lower bound for storing a grammar with $d$ rules is $\log(d!)+2d\approx d\log d+0.557 d$ bits, and the most efficient encoding algorithm in the literature uses at most $d\log d + 2d$ bits and runs in $\mathcal O(d^{1.5})$ time. We describe a linear-time heuristic maximizing the compressibility of the output Re-Pair grammar. On real datasets, our grammar encoding uses---on average---only $2.8\%$ more bits than the information-theoretic minimum. In half of the tested cases, our compressor improves the output size of 7-Zip with maximum compression rate turned on.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted