🌅
🌅
Old Age
Mapping Natural Language Instructions to Mobile UI Action Sequences
May 07, 2020 · 🏛 Annual Meeting of the Association for Computational Linguistics
"No code URL or promise found in abstract"
"HuggingFace models found (backfill)"
Evidence collected by the PWNC Scanner
Authors
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge
arXiv ID
2005.03776
Category
cs.CL: Computation & Language
Cross-listed
cs.LG
Citations
245
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://huggingface.co/datasets/OS-Copilot/OS-Atlas-data
Last Checked
9 days ago
Abstract
We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Computation & Language
🌅
🌅
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
🌅
🌅
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
🔮
🔮
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
🌅
🌅
Old Age
A large annotated corpus for learning natural language inference
🌅
🌅
Old Age