GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
December 24, 2024 Β· Declared Dead Β· π arXiv.org
Repo contents: README.md
Authors
Kangjia Zhao, Jiahui Song, Leigang Sha, Haozhan Shen, Zhi Chen, Tiancheng Zhao, Xiubo Liang, Jianwei Yin
arXiv ID
2412.18426
Category
cs.AI: Artificial Intelligence
Citations
11
Venue
arXiv.org
Repository
https://github.com/ZJU-ACES-ISE/ChatUITest
β 13
Last Checked
1 month ago
Abstract
Nowadays, research on GUI agents is a hot topic in the AI community. However, current research focuses on GUI task automation, limiting the scope of applications in various GUI scenarios. In this paper, we propose a formalized and comprehensive environment to evaluate the entire process of automated GUI Testing (GTArena), offering a fair, standardized environment for consistent operation of diverse multimodal large language models. We divide the testing process into three key subtasks: test intention generation, test task execution, and GUI defect detection, and construct a benchmark dataset based on these to conduct a comprehensive evaluation. It evaluates the performance of different models using three data types: real mobile applications, mobile applications with artificially injected defects, and synthetic data, thoroughly assessing their capabilities in this relevant task. Additionally, we propose a method that helps researchers explore the correlation between the performance of multimodal language large models in specific scenarios and their general capabilities in standard benchmark tests. Experimental results indicate that even the most advanced models struggle to perform well across all sub-tasks of automated GUI Testing, highlighting a significant gap between the current capabilities of Autonomous GUI Testing and its practical, real-world applicability. This gap provides guidance for the future direction of GUI Agent development. Our code is available at https://github.com/ZJU-ACES-ISE/ChatUITest.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Artificial Intelligence
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
R.I.P.
π»
Ghosted
Addressing Function Approximation Error in Actor-Critic Methods
R.I.P.
π»
Ghosted
Explanation in Artificial Intelligence: Insights from the Social Sciences
R.I.P.
π»
Ghosted
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
R.I.P.
π»
Ghosted
Complex Embeddings for Simple Link Prediction
Died the same way β π Death by README
R.I.P.
π
Death by README
Momentum Contrast for Unsupervised Visual Representation Learning
R.I.P.
π
Death by README
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
R.I.P.
π
Death by README
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
R.I.P.
π
Death by README