copMEM: Finding maximal exact matches via sampling both genomes

May 22, 2018 ยท Entered Twilight ยท ๐Ÿ› Bioinform.

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: CopMEM.src, MemoryFill.src, copmem.sln, demo.cmd, demo.sh, license, makefile, readme.txt

Authors Szymon Grabowski, Wojciech Bieniecki arXiv ID 1805.08816 Category cs.DS: Data Structures & Algorithms Cross-listed q-bio.GN Citations 15 Venue Bioinform. Repository https://github.com/wbieniec/copmem โญ 1 Last Checked 1 month ago
Abstract
Genome-to-genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single-threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using less than 10 GB of RAM memory.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms