RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison
November 12, 2015 Β· Declared Dead Β· π PLoS Comput. Biol.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Lars Hahn, Chris-AndrΓ© Leimeister, Rachid Ounit, Stefano Lonardi, Burkhard Morgenstern
arXiv ID
1511.04001
Category
q-bio.GN
Cross-listed
cs.DS,
q-bio.PE
Citations
39
Venue
PLoS Comput. Biol.
Last Checked
1 month ago
Abstract
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-bio.GN
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Accurate Genomic Prediction Of Human Height
R.I.P.
π»
Ghosted
Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models
π
π
Old Age
GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping
R.I.P.
π»
Ghosted
Tasks, Techniques, and Tools for Genomic Data Visualization
π
π
Old Age
Spaced seeds improve k-mer-based metagenomic classification
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted