Intrinsic Knowledge Evaluation on Chinese Language Models

November 29, 2020 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, chneval, data, eval_commonsense.py, eval_fact.py, eval_semantic_cos.py, eval_syntax.py, predictor.py, utils.py

Authors Zhiruo Wang, Renfen Hu arXiv ID 2011.14277 Category cs.CL: Computation & Language Citations 1 Venue arXiv.org Repository https://github.com/ZhiruoWang/ChnEval ⭐ 7 Last Checked 2 months ago

Abstract

Recent NLP tasks have benefited a lot from pre-trained language models (LM) since they are able to encode knowledge of various aspects. However, current LM evaluations focus on downstream performance, hence lack to comprehensively inspect in which aspect and to what extent have they encoded knowledge. This paper addresses both queries by proposing four tasks on syntactic, semantic, commonsense, and factual knowledge, aggregating to a total of $39,308$ questions covering both linguistic and world knowledge in Chinese. Throughout experiments, our probes and knowledge data prove to be a reliable benchmark for evaluating pre-trained Chinese LMs. Our work is publicly available at https://github.com/ZhiruoWang/ChnEval.