emrQA: A Large Corpus for Question Answering on Electronic Medical Records

September 03, 2018 · Declared Dead · 🏛 Conference on Empirical Methods in Natural Language Processing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng arXiv ID 1809.00732 Category cs.CL: Computation & Language Citations 236 Venue Conference on Empirical Methods in Natural Language Processing Last Checked 3 months ago

Abstract

We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computation & Language

🌅 🌅 Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL 🏛 NeurIPS 📚 166.0K cites 8 years ago

🌅 🌅 Old Age

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, ... (+2 more)

cs.CL 🏛 NAACL 📚 110.2K cites 7 years ago

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, ... (+8 more)

cs.CL 🏛 arXiv 📚 28.4K cites 6 years ago

R.I.P. 👻 Ghosted

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, ... (+6 more)

cs.CL 🏛 ACL 📚 12.3K cites 6 years ago

R.I.P. 👻 Ghosted

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, ... (+5 more)

cs.CL 🏛 NAACL 📚 12.0K cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago

R.I.P. 👻 Ghosted

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, ... (+2 more)

cs.CV 🏛 CVPR 📚 43.4K cites 10 years ago