copMEM: Finding maximal exact matches via sampling both genomes

May 22, 2018 · Entered Twilight · 🏛 Bioinform.

"Last commit was 6.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: CopMEM.src, MemoryFill.src, copmem.sln, demo.cmd, demo.sh, license, makefile, readme.txt

Authors Szymon Grabowski, Wojciech Bieniecki arXiv ID 1805.08816 Category cs.DS: Data Structures & Algorithms Cross-listed q-bio.GN Citations 15 Venue Bioinform. Repository https://github.com/wbieniec/copmem ⭐ 1 Last Checked 1 month ago

Abstract

Genome-to-genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single-threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using less than 10 GB of RAM memory.