Parallel sequence tagging for concept recognition

March 16, 2020 ยท Entered Twilight ยท ๐Ÿ› BMC Bioinformatics

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, .gitmodules, LICENSE, README.md, ab3p2json.py, abbrevs.json, abbrevs.py, bilstm, biobert, bionlp2conll.sh, notebooks, official-eval.sh, self-eval.sh, splits.json, splits.subm.json, standoff2conll, summarise-results.sh, test-eval.sh, top-1000-ids.tar.gz

Authors Lenz Furrer, Joseph Cornelius, Fabio Rinaldi arXiv ID 2003.07424 Category cs.CL: Computation & Language Cross-listed cs.IR, cs.LG Citations 10 Venue BMC Bioinformatics Repository https://github.com/OntoGene/craft-st โญ 2 Last Checked 1 month ago
Abstract
Background: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence. Results: We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task 2019. Conclusions: Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts). Availability and Implementation: Source code freely available for download at https://github.com/OntoGene/craft-st. Supplementary data are available at arXiv online.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago