Translator2Vec: Understanding and Representing Human Post-Editors

July 24, 2019 Β· Declared Dead Β· πŸ› Machine Translation Summit

🦴 CAUSE OF DEATH: Skeleton Repo
Boilerplate only, no real code

Repo contents: readme.md

Authors António Góis, André F. T. Martins arXiv ID 1907.10362 Category cs.CL: Computation & Language Citations 4 Venue Machine Translation Summit Repository https://github.com/Unbabel/translator2vec ⭐ 4 Last Checked 1 month ago
Abstract
The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago

Died the same way β€” 🦴 Skeleton Repo