Lower bounds for text indexing with mismatches and differences
December 21, 2018 Β· Declared Dead Β· π ACM-SIAM Symposium on Discrete Algorithms
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Vincent Cohen-Addad, Laurent Feuilloley, Tatiana Starikovskaya
arXiv ID
1812.09120
Category
cs.DS: Data Structures & Algorithms
Citations
10
Venue
ACM-SIAM Symposium on Discrete Algorithms
Last Checked
4 months ago
Abstract
In this paper we study lower bounds for the fundamental problem of text indexing with mismatches and differences. In this problem we are given a long string of length $n$, the "text", and the task is to preprocess it into a data structure such that given a query string $Q$, one can quickly identify substrings that are within Hamming or edit distance at most $k$ from $Q$. This problem is at the core of various problems arising in biology and text processing. While exact text indexing allows linear-size data structures with linear query time, text indexing with $k$ mismatches (or $k$ differences) seems to be much harder: All known data structures have exponential dependency on $k$ either in the space, or in the time bound. We provide conditional and pointer-machine lower bounds that make a step toward explaining this phenomenon. We start by demonstrating lower bounds for $k = Ξ(\log n)$. We show that assuming the Strong Exponential Time Hypothesis, any data structure for text indexing that can be constructed in polynomial time cannot have $\mathcal{O}(n^{1-Ξ΄})$ query time, for any $Ξ΄>0$. This bound also extends to the setting where we only ask for $(1+\varepsilon)$-approximate solutions for text indexing. However, in many applications the value of $k$ is rather small, and one might hope that for small~$k$ we can develop more efficient solutions. We show that this would require a radically new approach as using the current methods one cannot avoid exponential dependency on $k$ either in the space, or in the time bound for all even $\frac{8}{\sqrt{3}} \sqrt{\log n} \le k = o(\log n)$. Our lower bounds also apply to the dictionary look-up problem, where instead of a text one is given a set of strings.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted