Playing with Words at the National Library of Sweden -- Making a Swedish BERT

July 03, 2020 Β· Declared Dead Β· πŸ› arXiv.org

πŸ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: README.md, requirements.txt

Authors Martin Malmsten, Love Bârjeson, Chris Haffenden arXiv ID 2007.01658 Category cs.CL: Computation & Language Citations 136 Venue arXiv.org Repository https://github.com/Kungbib/swedish-bert-models ⭐ 142 Last Checked 1 month ago
Abstract
This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB). Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish. We also present the results of our model in comparison with existing models - chiefly that produced by the Swedish Public Employment Service, ArbetsfΓΆrmedlingen, and Google's multilingual M-BERT - where we demonstrate that KB-BERT outperforms these in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS). Our discussion highlights the difficulties that continue to exist given the lack of training data and testbeds for smaller languages like Swedish. We release our model for further exploration and research here: https://github.com/Kungbib/swedish-bert-models .
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago

Died the same way β€” πŸ“œ Death by README