HITS: High-coverage LLM-based Unit Test Generation via Method Slicing

August 21, 2024 · Declared Dead · 🏛 International Conference on Automated Software Engineering

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zejun Wang, Kaibo Liu, Ge Li, Zhi Jin arXiv ID 2408.11324 Category cs.SE: Software Engineering Citations 74 Venue International Conference on Automated Software Engineering Last Checked 3 months ago

Abstract

Large language models (LLMs) have behaved well in generating unit tests for Java projects. However, the performance for covering the complex focal methods within the projects is poor. Complex methods comprise many conditions and loops, requiring the test cases to be various enough to cover all lines and branches. However, existing test generation methods with LLMs provide the whole method-to-test to the LLM without assistance on input analysis. The LLM has difficulty inferring the test inputs to cover all conditions, resulting in missing lines and branches. To tackle the problem, we propose decomposing the focal methods into slices and asking the LLM to generate test cases slice by slice. Our method simplifies the analysis scope, making it easier for the LLM to cover more lines and branches in each slice. We build a dataset comprising complex focal methods collected from the projects used by existing state-of-the-art approaches. Our experiment results show that our method significantly outperforms current test case generation methods with LLMs and the typical SBST method Evosuite regarding both line and branch coverage scores.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Software Engineering

R.I.P. 👻 Ghosted

ImageJ2: ImageJ for the next generation of scientific image data

Curtis T. Rueden, Johannes Schindelin, ... (+5 more)

cs.SE 🏛 BMC Bioinformatics 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

GraphCodeBERT: Pre-training Code Representations with Data Flow

Daya Guo, Shuo Ren, ... (+16 more)

cs.SE 🏛 ICLR 📚 1.5K cites 5 years ago

R.I.P. 👻 Ghosted

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Yuchi Tian, Kexin Pei, ... (+2 more)

cs.SE 🏛 ICSE 📚 1.4K cites 8 years ago

R.I.P. 👻 Ghosted

Microservices: yesterday, today, and tomorrow

Nicola Dragoni, Saverio Giallorenzo, ... (+5 more)

cs.SE 🏛 Present and Ulterior Software Engineering 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Yaqin Zhou, Shangqing Liu, ... (+3 more)

cs.SE 🏛 NeurIPS 📚 1.0K cites 6 years ago

R.I.P. 👻 Ghosted

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, ... (+2 more)

cs.SE 🏛 ACM CSUR 📚 962 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago