Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage

April 30, 2019 · Entered Twilight · 🏛 Conference on Automated Knowledge Base Construction

"Last commit was 6.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitmodules, AKBC_2019_HAnDS_supplementary_material.pdf, Dockerfile, LICENSE, README.md, datasets, results, src, stats, utils

Authors Abhishek Abhishek, Sanya Bathla Taneja, Garima Malik, Ashish Anand, Amit Awekar arXiv ID 1904.13178 Category cs.CL: Computation & Language Citations 3 Venue Conference on Automated Knowledge Base Construction Repository https://github.com/abhipec/HAnDS ⭐ 9 Last Checked 1 month ago

Abstract

Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly annotated while covering a large spectrum of entity types. Our work directly addresses this issue. We propose Heuristics Allied with Distant Supervision (HAnDS) framework to automatically construct a quality dataset suitable for the FgER task. HAnDS framework exploits the high interlink among Wikipedia and Freebase in a pipelined manner, reducing annotation errors introduced by naively using distant supervision approach. Using HAnDS framework, we create two datasets, one suitable for building FgER systems recognizing up to 118 entity types based on the FIGER type hierarchy and another for up to 1115 entity types based on the TypeNet hierarchy. Our extensive empirical experimentation warrants the quality of the generated datasets. Along with this, we also provide a manually annotated dataset for benchmarking FgER systems.