Spaced seeds improve k-mer-based metagenomic classification

February 22, 2015 ยท Entered Twilight ยท ๐Ÿ› Bioinform.

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 10.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitmodules, README.rst, scripts, section.3.1.2, section.3.2, section.3.3, section.3.4, seed-kraken

Authors Karel Brinda, Maciej Sykulski, Gregory Kucherov arXiv ID 1502.06256 Category q-bio.GN Cross-listed cs.CE, cs.LG Citations 93 Venue Bioinform. Repository https://github.com/gregorykucherov/spaced-seeds-for-metagenomics โญ 13 Last Checked 1 month ago
Abstract
Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” q-bio.GN