Learning Autocompletion from Real-World Datasets

November 09, 2020 · Declared Dead · 🏛 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Gareth Ari Aye, Seohyun Kim, Hongyu Li arXiv ID 2011.04542 Category cs.SE: Software Engineering Citations 37 Venue 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) Last Checked 3 months ago

Abstract

Code completion is a popular software development tool integrated into all major IDEs. Many neural language models have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When Code Completion Fails: a Case Study on Real-World Completions demonstrates that these results may not translate to improvements in real-world performance. To combat this effect, we train models on real-world code completion examples and find that these models outperform models trained on committed source code and working version snapshots by 12.8% and 13.8% accuracy respectively. We observe this improvement across modeling technologies and show through A/B testing that it corresponds to a 6.2% increase in programmers' actual autocompletion usage. Furthermore, our study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Software Engineering

R.I.P. 👻 Ghosted

ImageJ2: ImageJ for the next generation of scientific image data

Curtis T. Rueden, Johannes Schindelin, ... (+5 more)

cs.SE 🏛 BMC Bioinformatics 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

GraphCodeBERT: Pre-training Code Representations with Data Flow

Daya Guo, Shuo Ren, ... (+16 more)

cs.SE 🏛 ICLR 📚 1.5K cites 5 years ago

R.I.P. 👻 Ghosted

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Yuchi Tian, Kexin Pei, ... (+2 more)

cs.SE 🏛 ICSE 📚 1.4K cites 8 years ago

R.I.P. 👻 Ghosted

Microservices: yesterday, today, and tomorrow

Nicola Dragoni, Saverio Giallorenzo, ... (+5 more)

cs.SE 🏛 Present and Ulterior Software Engineering 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Yaqin Zhou, Shangqing Liu, ... (+3 more)

cs.SE 🏛 NeurIPS 📚 1.0K cites 6 years ago

R.I.P. 👻 Ghosted

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, ... (+2 more)

cs.SE 🏛 ACM CSUR 📚 962 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago