Kmerlight: fast and accurate k-mer abundance estimation
September 19, 2016 ยท Entered Twilight ยท ๐ arXiv.org
"Last commit was 9.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE, README, libs, makefile, sample.fq, src
Authors
Naveen Sivadasan, Rajgopal Srinivasan, Kshama Goyal
arXiv ID
1609.05626
Category
cs.DS: Data Structures & Algorithms
Citations
12
Venue
arXiv.org
Repository
https://github.com/nsivad/kmerlight
โญ 6
Last Checked
1 month ago
Abstract
k-mers (nucleotide strings of length k) form the basis of several algorithms in computational genomics. In particular, k-mer abundance information in sequence data is useful in read error correction, parameter estimation for genome assembly, digital normalization etc. We give a streaming algorithm Kmerlight for computing the k-mer abundance histogram from sequence data. Our algorithm is fast and uses very small memory footprint. We provide analytical bounds on the error guarantees of our algorithm. Kmerlight can efficiently process genome scale and metagenome scale data using standard desktop machines. Few applications of abundance histograms computed by Kmerlight are also shown. We use abundance histogram for de novo estimation of repetitiveness in the genome based on a simple probabilistic model that we propose. We also show estimation of k-mer error rate in the sampling using abundance histogram. Our algorithm can also be used for abundance estimation in a general streaming setting. The Kmerlight tool is written in C++ and is available for download and use from https://github.com/nsivad/kmerlight.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Data Structures & Algorithms
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Relief-Based Feature Selection: Introduction and Review
R.I.P.
๐ป
Ghosted
Route Planning in Transportation Networks
R.I.P.
๐ป
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
๐ป
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
๐ป
Ghosted