R.I.P.
๐ป
Ghosted
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
May 21, 2024 ยท Entered Twilight ยท ๐ Annual Meeting of the Association for Computational Linguistics
Repo contents: .DS_Store, README.md, all_checkpoints, convert.py, data, data_provider, llm_tuning.py, model, proteinchat_tuning.py, read_results.py, stage1.py, stage2.py, train_protclap.py
Authors
Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
arXiv ID
2405.12564
Category
q-bio.QM
Cross-listed
cs.CL,
cs.MM
Citations
33
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://github.com/acharkq/ProtT3
โญ 51
Last Checked
1 month ago
Abstract
Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts. To address their limitations, we introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding. ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation. This collaboration between PLM and LM is facilitated by a cross-modal projector (i.e., Q-Former) that bridges the modality gap between the PLM's representation space and the LM's input space. Unlike previous studies focusing on protein property prediction and protein-text retrieval, we delve into the largely unexplored field of protein-to-text generation. To facilitate comprehensive benchmarks and promote future research, we establish quantitative evaluations for protein-text modeling tasks, including protein captioning, protein question-answering, and protein-text retrieval. Our experiments show that ProtT3 substantially surpasses current baselines, with ablation studies further highlighting the efficacy of its core components. Our code is available at https://github.com/acharkq/ProtT3.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ q-bio.QM
R.I.P.
๐ป
Ghosted
GuacaMol: Benchmarking Models for De Novo Molecular Design
R.I.P.
๐ป
Ghosted
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
R.I.P.
๐ป
Ghosted
ProtVec: A Continuous Distributed Representation of Biological Sequences
R.I.P.
๐ป
Ghosted
A Perspective on Deep Imaging
R.I.P.
๐
404 Not Found