Variant tolerant read mapping using min-hashing

February 06, 2017 · Declared Dead · 🏛 arXiv.org

Authors Jens Quedenfeld, Sven Rahmann arXiv ID 1702.01703 Category q-bio.GN Cross-listed cs.DS Citations 4 Venue arXiv.org Repository https://bitbucket.org/Quedenfeld/vatram-src/ Last Checked 1 month ago

Abstract

DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered as a single string over ACGT, but as a complex object containing these variants. Most existing read mappers do not handle these new circumstances appropriately. We introduce a new read mapper prototype called VATRAM that considers variants. It is based on Min-Hashing of q-gram sets of reference genome windows. Min-Hashing is one form of locality sensitive hashing. The variants are directly inserted into VATRAMs index which leads to a fast mapping process. Our results show that VATRAM achieves better precision and recall than state-of-the-art read mappers like BWA under certain cirumstances. VATRAM is open source and can be accessed at https://bitbucket.org/Quedenfeld/vatram-src/.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 💻 Repository 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — q-bio.GN

R.I.P. 👻 Ghosted

DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier

Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf

q-bio.GN 🏛 Bioinform. 📚 443 cites 8 years ago

R.I.P. 👻 Ghosted

Accurate Genomic Prediction Of Human Height

Louis Lello, Steven G. Avery, ... (+4 more)

q-bio.GN 🏛 Genetics 📚 154 cites 8 years ago

R.I.P. 👻 Ghosted

Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models

Tianyu Zhang, Liwei Zhang, ... (+2 more)

q-bio.GN 🏛 Methods in molecular biology 📚 120 cites 7 years ago

🌅 🌅 Old Age

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

Mohammed Alser, Hasan Hassan, ... (+4 more)

q-bio.GN 🏛 Bioinform. 📚 116 cites 10 years ago

R.I.P. 👻 Ghosted

Tasks, Techniques, and Tools for Genomic Data Visualization

Sabrina Nusrat, Theresa Harbig, Nils Gehlenborg

q-bio.GN 🏛 CGF 📚 104 cites 6 years ago

🌅 🌅 Old Age

Spaced seeds improve k-mer-based metagenomic classification

Karel Brinda, Maciej Sykulski, Gregory Kucherov

q-bio.GN 🏛 Bioinform. 📚 93 cites 11 years ago

Died the same way — 💀 404 Not Found

R.I.P. 💀 404 Not Found

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, Ke Sun, ... (+10 more)

cs.CV 🏛 IEEE TPAMI 📚 4.4K cites 6 years ago

R.I.P. 💀 404 Not Found

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Thomas Wolf, Lysandre Debut, ... (+20 more)

cs.CL 🏛 arXiv 📚 3.5K cites 6 years ago

R.I.P. 💀 404 Not Found

CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang, Xinggang Wang, ... (+5 more)

cs.CV 🏛 ICCV 📚 2.9K cites 7 years ago

R.I.P. 💀 404 Not Found

Unified Perceptual Parsing for Scene Understanding

Tete Xiao, Yingcheng Liu, ... (+3 more)

cs.CV 🏛 ECCV 📚 2.3K cites 7 years ago