Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction
November 21, 2018 ยท Entered Twilight ยท ๐ arXiv.org
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, config, datasets, graph_outputs, main.py, output, requirements.txt, resources, src
Authors
Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh
arXiv ID
1811.08619
Category
cs.CL: Computation & Language
Citations
4
Venue
arXiv.org
Repository
https://github.com/Saurav0074/morph_analyzer
โญ 7
Last Checked
2 months ago
Abstract
The ambiguities introduced by the recombination of morphemes constructing several possible inflections for a word makes the prediction of syntactic traits in Morphologically Rich Languages (MRLs) a notoriously complicated task. We propose the Multi Task Deep Morphological analyzer (MT-DMA), a character-level neural morphological analyzer based on multitask learning of word-level tag markers for Hindi and Urdu. MT-DMA predicts a set of six morphological tags for words of Indo-Aryan languages: Parts-of-speech (POS), Gender (G), Number (N), Person (P), Case (C), Tense-Aspect-Modality (TAM) marker as well as the Lemma (L) by jointly learning all these in one trainable framework. We show the effectiveness of training of such deep neural networks by the simultaneous optimization of multiple loss functions and sharing of initial parameters for context-aware morphological analysis. Exploiting character-level features in phonological space optimized for each tag using multi-objective genetic algorithm, our model establishes a new state-of-the-art accuracy score upon all seven of the tasks for both the languages. MT-DMA is publicly accessible: code, models and data are available at https://github.com/Saurav0074/morph_analyzer.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted