Indexing Weighted Sequences: Neat and Efficient
April 25, 2017 Β· Declared Dead Β· π Information and Computation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, Jakub Radoszewski
arXiv ID
1704.07625
Category
cs.DS: Data Structures & Algorithms
Citations
23
Venue
Information and Computation
Last Checked
3 months ago
Abstract
In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or uncertain data, for example, in molecular biology where they are known under the name of Position-Weight Matrices. Given a probability threshold $\frac1z$, we say that a string $P$ of length $m$ occurs in a weighted sequence $X$ at position $i$ if the product of probabilities of the letters of $P$ at positions $i,\ldots,i+m-1$ in $X$ is at least $\frac1z$. In this article, we consider an \emph{indexing} variant of the problem, in which we are to preprocess a weighted sequence to answer multiple pattern matching queries. We present an $O(nz)$-time construction of an $O(nz)$-sized index for a weighted sequence of length $n$ over a constant-sized alphabet that answers pattern matching queries in optimal, $O(m+Occ)$ time, where $Occ$ is the number of occurrences reported. The cornerstone of our data structure is a novel construction of a family of $\lfloor z \rfloor$ special strings that carries the information about all the strings that occur in the weighted sequence with a sufficient probability. We obtain a weighted index with the same complexities as in the most efficient previously known index by Barton et al. (CPM 2016), but our construction is significantly simpler. The most complex algorithmic tool required in the basic form of our index is the suffix tree which we use to develop a new, more straightforward index for the so-called property matching problem. We provide an implementation of our data structure. Our construction allows us also to obtain a significant improvement over the complexities of the approximate variant of the weighted index presented by Biswas et al. (EDBT 2016) and an improvement of the space complexity of their general index.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted