OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

December 12, 2024 Β· Declared Dead Β· πŸ› Conference on Empirical Methods in Natural Language Processing

πŸ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: README.md

Authors Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne SÀlevÀ, Constantine Lignos arXiv ID 2412.09587 Category cs.CL: Computation & Language Citations 1 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/bltlab/open-ner ⭐ 3 Last Checked 1 month ago
Abstract
We present OpenNER 1.0, a standardized collection of openly-available named entity recognition (NER) datasets. OpenNER contains 36 NER corpora that span 52 languages, human-annotated in varying named entity ontologies. We correct annotation format issues, standardize the original datasets into a uniform representation with consistent entity type names across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER. We provide baseline results using three pretrained multilingual language models and two large language models to compare the performance of recent models and facilitate future research in NER. We find that no single model is best in all languages and that significant work remains to obtain high performance from LLMs on the NER task. OpenNER is released at https://github.com/bltlab/open-ner.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago

Died the same way β€” πŸ“œ Death by README