Approximating Text-to-Pattern Hamming Distances
January 01, 2020 ยท Declared Dead ยท ๐ Symposium on the Theory of Computing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat
arXiv ID
2001.00211
Category
cs.DS: Data Structures & Algorithms
Citations
23
Venue
Symposium on the Theory of Computing
Last Checked
3 months ago
Abstract
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $ฯ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ฮต)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ฮต^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ฮต)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is $O(ฮต^{-2}n)$. - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$. - We obtain approximation algorithms with better $ฮต$-dependence using rectangular matrix multiplication. The time-bound is $ร(n)$ when the pattern is sufficiently long: $m\ge ฮต^{-28}$. Previous algorithms require $ร(ฮต^{-1}n)$ time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{ฮฉ(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $ฮดm$ at every location, running in $ร(ฮด^{-1/3}n^{2/3}+ฮด^{-1}n/m)$ time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $ร(ฮต^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with ร(k) space or for the approximate problem with $ร(ฮต^{-O(1)}\sqrt m)$ space.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Data Structures & Algorithms
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Relief-Based Feature Selection: Introduction and Review
R.I.P.
๐ป
Ghosted
Route Planning in Transportation Networks
R.I.P.
๐ป
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
๐ป
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
๐ป
Ghosted
Graph Isomorphism in Quasipolynomial Time
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted