Protein identification with deep learning: from abc to xyz

October 08, 2017 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, deepnovo_config.py, deepnovo_cython_modules.pyx, deepnovo_cython_setup.py, deepnovo_main.py, deepnovo_main_modules.py, deepnovo_misc.py, deepnovo_model.py, deepnovo_model_training.py, deepnovo_worker_db.py, deepnovo_worker_denovo.py, deepnovo_worker_io.py, deepnovo_worker_test.py

Authors Ngoc Hieu Tran, Zachariah Levine, Lei Xin, Baozhen Shan, Ming Li arXiv ID 1710.02765 Category cs.CE: Computational Engineering Cross-listed cs.LG, q-bio.BM Citations 6 Venue arXiv.org Repository https://github.com/nh2tran/DeepNovo โญ 102 Last Checked 2 months ago
Abstract
Proteins are the main workhorses of biological functions in a cell, a tissue, or an organism. Identification and quantification of proteins in a given sample, e.g. a cell type under normal/disease conditions, are fundamental tasks for the understanding of human health and disease. In this paper, we present DeepNovo, a deep learning-based tool to address the problem of protein identification from tandem mass spectrometry data. The idea was first proposed in the context of de novo peptide sequencing [1] in which convolutional neural networks and recurrent neural networks were applied to predict the amino acid sequence of a peptide from its spectrum, a similar task to generating a caption from an image. We further develop DeepNovo to perform sequence database search, the main technique for peptide identification that greatly benefits from numerous existing protein databases. We combine two modules de novo sequencing and database search into a single deep learning framework for peptide identification, and integrate de Bruijn graph assembly technique to offer a complete solution to reconstruct protein sequences from tandem mass spectrometry data. This paper describes a comprehensive protocol of DeepNovo for protein identification, including training neural network models, dynamic programming search, database querying, estimation of false discovery rate, and de Bruijn graph assembly. Training and testing data, model implementations, and comprehensive tutorials in form of IPython notebooks are available in our GitHub repository (https://github.com/nh2tran/DeepNovo).
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computational Engineering