Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning
December 25, 2020 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Amy X. Lu, Alex X. Lu, Alan Moses
arXiv ID
2012.13475
Category
q-bio.BM
Cross-listed
cs.LG,
cs.NE
Citations
17
Venue
arXiv.org
Last Checked
1 month ago
Abstract
Self-supervised representation learning of biological sequence embeddings alleviates computational resource constraints on downstream tasks while circumventing expensive experimental label acquisition. However, existing methods mostly borrow directly from large language models designed for NLP, rather than with bioinformatics philosophies in mind. Recently, contrastive mutual information maximization methods have achieved state-of-the-art representations for ImageNet. In this perspective piece, we discuss how viewing evolution as natural sequence augmentation and maximizing information across phylogenetic "noisy channels" is a biologically and theoretically desirable objective for pretraining encoders. We first provide a review of current contrastive learning literature, then provide an illustrative example where we show that contrastive learning using evolutionary augmentation can be used as a representation learning objective which maximizes the mutual information between biological sequences and their conserved function, and finally outline rationale for this approach.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-bio.BM
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Protein secondary structure prediction using deep convolutional neural fields
R.I.P.
π»
Ghosted
Protein structure generation via folding diffusion
R.I.P.
π
404 Not Found
LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search
R.I.P.
π»
Ghosted
What is a meaningful representation of protein sequences?
R.I.P.
π»
Ghosted
Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted