Translator2Vec: Understanding and Representing Human Post-Editors
July 24, 2019 Β· Declared Dead Β· π Machine Translation Summit
Repo contents: readme.md
Authors
AntΓ³nio GΓ³is, AndrΓ© F. T. Martins
arXiv ID
1907.10362
Category
cs.CL: Computation & Language
Citations
4
Venue
Machine Translation Summit
Repository
https://github.com/Unbabel/translator2vec
β 4
Last Checked
1 month ago
Abstract
The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computation & Language
π
π
Old Age
π
π
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
π»
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
π»
Ghosted
Deep contextualized word representations
Died the same way β 𦴠Skeleton Repo
R.I.P.
π¦΄
Skeleton Repo
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
R.I.P.
π¦΄
Skeleton Repo
Deep Learning for 3D Point Clouds: A Survey
R.I.P.
π¦΄
Skeleton Repo
Adversarial Examples: Attacks and Defenses for Deep Learning
R.I.P.
π¦΄
Skeleton Repo