ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
June 22, 2020 ยท Entered Twilight ยท ๐ AAAI Conference on Artificial Intelligence
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: BiDAF, InHouseBert, README.md, data, model.py, prepare_data.py, requirements.txt, test.py, train.py, utils.py
Authors
BingningWang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang
arXiv ID
2006.12146
Category
cs.CL: Computation & Language
Citations
25
Venue
AAAI Conference on Artificial Intelligence
Repository
https://github.com/benywon/ReCO
โญ 37
Last Checked
1 month ago
Abstract
This paper presents the ReCO, a human-curated ChineseReading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to the commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that requires various types of reasoning skills, such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT, only achieve 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO presents a good challenge for machine reading comprehension. The codes, datasets are freely available at https://github.com/benywon/ReCO.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted