A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2

October 27, 2020 ยท Entered Twilight ยท ๐Ÿ› NLP4COVID@EMNLP

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: README.md, extract_terms.py, requirements.txt, strings.py, terms, update_terms.sh

Authors Robert Leaman, Zhiyong Lu arXiv ID 2010.14588 Category cs.DL: Digital Libraries Cross-listed cs.CL Citations 8 Venue NLP4COVID@EMNLP Repository https://github.com/ncbi-nlp/CovidTermVar โญ 2 Last Checked 1 month ago
Abstract
The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature to refer to SARS-CoV-2 and COVID-19. We use a rule-based approach to iteratively generate new term variants, then locate these variants in a large text corpus. We compare our dictionary to an extensive collection of terminological resources, demonstrating that our resource provides a substantial number of additional terms. We use our dictionary to analyze the usage of SARS-CoV-2 and COVID-19 terms over time and show that the number of unique terms continues to grow rapidly. Our dictionary is freely available at https://github.com/ncbi-nlp/CovidTermVar.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Digital Libraries