Spaced seeds improve k-mer-based metagenomic classification
February 22, 2015 ยท Entered Twilight ยท ๐ Bioinform.
"Last commit was 10.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitmodules, README.rst, scripts, section.3.1.2, section.3.2, section.3.3, section.3.4, seed-kraken
Authors
Karel Brinda, Maciej Sykulski, Gregory Kucherov
arXiv ID
1502.06256
Category
q-bio.GN
Cross-listed
cs.CE,
cs.LG
Citations
93
Venue
Bioinform.
Repository
https://github.com/gregorykucherov/spaced-seeds-for-metagenomics
โญ 13
Last Checked
1 month ago
Abstract
Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ q-bio.GN
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Accurate Genomic Prediction Of Human Height
R.I.P.
๐ป
Ghosted
Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models
๐
๐
Old Age
GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping
R.I.P.
๐ป
Ghosted
Tasks, Techniques, and Tools for Genomic Data Visualization
R.I.P.
๐ป
Ghosted