Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions
March 05, 2023 ยท Declared Dead ยท ๐ International Conference on Automated Software Engineering
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Louis G. Michael, James Donohue, James C. Davis, Dongyoon Lee, Francisco Servant
arXiv ID
2303.02555
Category
cs.SE: Software Engineering
Citations
63
Venue
International Conference on Automated Software Engineering
Last Checked
3 months ago
Abstract
Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face. In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the details about the difficulties that they face and the solutions that they prefer. In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document. They are also hard to master: the majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those who knew of the risks did not deal with them in effective manners. Our findings provide multiple implications for future work, including semantic regex search engines for regex reuse and improved input generators for regex validation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Software Engineering
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
GraphCodeBERT: Pre-training Code Representations with Data Flow
R.I.P.
๐ป
Ghosted
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars
R.I.P.
๐ป
Ghosted
Microservices: yesterday, today, and tomorrow
R.I.P.
๐ป
Ghosted
Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks
R.I.P.
๐ป
Ghosted
A Survey of Machine Learning for Big Code and Naturalness
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted