Document Listing on Repetitive Collections with Guaranteed Performance
July 20, 2017 Β· Declared Dead Β· π Annual Symposium on Combinatorial Pattern Matching
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Gonzalo Navarro
arXiv ID
1707.06374
Category
cs.DS: Data Structures & Algorithms
Citations
24
Venue
Annual Symposium on Combinatorial Pattern Matching
Last Checked
3 months ago
Abstract
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,Ο]$ is composed of $D$ copies of a string of size $n$, and $s$ edits are applied on ranges of copies. We introduce the first document listing index with size $\tilde{O}(n+s)$, precisely $O((n\logΟ+s\log^2 N)\log D)$ bits, and with useful worst-case time guarantees: Given a pattern of length $m$, the index reports the $\ndoc>0$ strings where it appears in time $O(m\log^{1+Ξ΅} N \cdot \ndoc)$, for any constant $Ξ΅>0$ (and tells in time $O(m\log N)$ if $\ndoc=0$). Our technique is to augment a range data structure that is commonly used on grammar-based indexes, so that instead of retrieving all the pattern occurrences, it computes useful summaries on them. We show that the idea has independent interest: we introduce the first grammar-based index that, on a text $T[1,N]$ with a grammar of size $r$, uses $O(r\log N)$ bits and counts the number of occurrences of a pattern $P[1,m]$ in time $O(m^2 + m\log^{2+Ξ΅} r)$, for any constant $Ξ΅>0$. We also give the first index using $O(z\log(N/z)\log N)$ bits, where $T$ is parsed by Lempel-Ziv into $z$ phrases, counting occurrences in time $O(m\log^{2+Ξ΅} N)$.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted