Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging

August 13, 2018 Β· Entered Twilight Β· πŸ› North American Chapter of the Association for Computational Linguistics

πŸŒ… TWILIGHT: Old Age
Predates the code-sharing era β€” a pioneer of its time

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: README.md, __pycache__, create_noisy.py, data, data_loader.py, evaluate.py, evaluate_marmot_joint.py, evaluate_marmot_relaxed.py, model_grc.py, model_srnn.py, models, predict.py, train.py

Authors Apostolos Kemos, Heike Adel, Hinrich Schütze arXiv ID 1808.04208 Category cs.CL: Computation & Language Citations 10 Venue North American Chapter of the Association for Computational Linguistics Repository https://github.com/cisnlp/semi-markov-crf ⭐ 16 Last Checked 6 days ago
Abstract
Character-level models of tokens have been shown to be effective at dealing with within-token noise and out-of-vocabulary words. But these models still rely on correct token boundaries. In this paper, we propose a novel end-to-end character-level model and demonstrate its effectiveness in multilingual settings and when token boundaries are noisy. Our model is a semi-Markov conditional random field with neural networks for character and segment representation. It requires no tokenizer. The model matches state-of-the-art baselines for various languages and significantly outperforms them on a noisy English version of a part-of-speech tagging benchmark dataset. Our code and the noisy dataset are publicly available at http://cistern.cis.lmu.de/semiCRF.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago