When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention

July 29, 2024 ยท Entered Twilight ยท ๐Ÿ› International Symposium on Software Testing and Analysis

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitignore, Appendix.pdf, CodeFast.yml, README.md, human_eval, lm_eval, main.py, models, mxeval, setup.sh, transformers

Authors Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng arXiv ID 2407.20042 Category cs.SE: Software Engineering Citations 38 Venue International Symposium on Software Testing and Analysis Repository https://github.com/DeepSoftwareAnalytics/CodeFast โญ 9 Last Checked 1 month ago
Abstract
Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code generation tasks and identify a significant efficiency issue, i.e., continual generation of excess tokens. It harms the developer productivity and leads to huge computational wastes. To address it, we introduce CodeFast, an inference acceleration approach for Code LLMs on code generation. The key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected. First, we propose an automatic data construction framework to obtain training data. Then, we train a unified lightweight model GenGuard applicable to multiple programming languages to predict whether to terminate inference at the current step. Finally, we enhance Code LLM with GenGuard to accelerate its inference in code generation tasks. We conduct extensive experiments with CodeFast on five representative Code LLMs across four widely used code generation datasets. Experimental results show that (1) CodeFast can significantly improve the inference speed of various Code LLMs in code generation, ranging form 34% to 452%, without compromising the quality of generated code. (2) CodeFast is stable across different parameter settings and can generalize to untrained datasets. Our code and data are available at https://github.com/DeepSoftwareAnalytics/CodeFast
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Software Engineering