CA-EHN: Commonsense Analogy from E-HowNet

August 20, 2019 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, data

Authors Peng-Hsuan Li, Tsan-Yu Yang, Wei-Yun Ma arXiv ID 1908.07218 Category cs.CL: Computation & Language Citations 2 Venue arXiv.org Repository https://github.com/ckiplab/CA-EHN ⭐ 5 Last Checked 2 months ago

Abstract

Embedding commonsense knowledge is crucial for end-to-end models to generalize inference beyond training corpora. However, existing word analogy datasets have tended to be handcrafted, involving permutations of hundreds of words with only dozens of pre-defined relations, mostly morphological relations and named entities. In this work, we model commonsense knowledge down to word-level analogical reasoning by leveraging E-HowNet, an ontology that annotates 88K Chinese words with their structured sense definitions and English translations. We present CA-EHN, the first commonsense word analogy dataset containing 90,505 analogies covering 5,656 words and 763 relations. Experiments show that CA-EHN stands out as a great indicator of how well word representations embed commonsense knowledge. The dataset is publicly available at https://github.com/ckiplab/CA-EHN.