SOPanG 2: online searching over a pan-genome without false positives

April 06, 2020 Β· Entered Twilight Β· πŸ› arXiv.org

πŸŒ… TWILIGHT: Old Age
Predates the code-sharing era β€” a pioneer of its time

"Last commit was 5.0 years ago (β‰₯5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitattributes, .gitignore, LICENSE.txt, README.md, bitset.hpp, end_to_end_tests, helpers.hpp, main.cpp, makefile, params.hpp, parsing.cpp, parsing.hpp, performance_tests, pics, prototypes, sample, scripts, sopang.cpp, sopang.hpp, unit_tests, zstd_helper.cpp, zstd_helper.hpp

Authors Aleksander CisΕ‚ak, Szymon Grabowski arXiv ID 2004.03033 Category cs.DS: Data Structures & Algorithms Citations 3 Venue arXiv.org Repository https://github.com/MrAlexSee/sopang ⭐ 10 Last Checked 2 months ago
Abstract
Motivation: The pan-genome can be stored as elastic-degenerate (ED) string, a recently introduced compact representation of multiple overlapping sequences. However, a search over the ED string does not indicate which individuals (if any) match the entire query. Results: We augment the ED string with sources (individuals' indexes) and propose an extension of the SOPanG (Shift-Or for Pan-Genome) tool to report only true positive matches, omitting those not occurring in any of the haplotypes. The additional stage for checking the matches yields a penalty of less than 3.5% relative speed in practice, which means that SOPanG 2 is able to report pattern matches in a pan-genome, mapping them onto individuals, at the single-thread throughput of above 430 MB/s on real data. Availability and implementation: SOPanG 2 can be downloaded here: github.com/MrAlexSee/sopang
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Data Structures & Algorithms