BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale
December 23, 2019 ยท Entered Twilight ยท ๐ PLoS Comput. Biol.
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, bioconcept_tutorial.ipynb, datasets
Authors
Qingyu Chen, Kyubum Lee, Shankai Yan, Sun Kim, Chih-Hsuan Wei, Zhiyong Lu
arXiv ID
1912.10846
Category
cs.CL: Computation & Language
Cross-listed
cs.IR,
cs.LG
Citations
53
Venue
PLoS Comput. Biol.
Repository
https://github.com/ncbi-nlp/BioConceptVec
โญ 43
Last Checked
1 month ago
Abstract
Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The evaluation results demonstrate that BioConceptVec has better performance than existing methods in all tasks. Finally, BioConceptVec is made freely available to the research community and general public via https://github.com/ncbi-nlp/BioConceptVec.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted