FMtree: A fast locating algorithm of FM-indexes for genomic data
April 15, 2017 Β· Declared Dead Β· π Bioinform.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Haoyu Cheng, Ming Wu, Yun Xu
arXiv ID
1704.04615
Category
cs.DS: Data Structures & Algorithms
Citations
9
Venue
Bioinform.
Last Checked
4 months ago
Abstract
Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text is widely accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g., suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text index, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are particularly slow, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual quadtree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the quadtree. Compared with the existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted