MobileViews: A Million-scale and Diverse Mobile GUI Dataset

September 22, 2024 ยท Declared Dead ยท + Add venue

๐Ÿ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Longxi Gao, Li Zhang, Shihe Wang, Pengzhi Gao, Wei Liu, Jian Luan, Shangguang Wang, Yuanchun Li, Mengwei Xu arXiv ID 2409.14337 Category cs.HC: Human-Computer Interaction Citations 0 Repository https://huggingface.co/datasets/mllmTeam/MobileViews Last Checked 2 months ago
Abstract
Visual language models (VLMs) empower mobile GUI agents to interpret complex mobile screens and respond to user requests. Training such capable agents requires large-scale, high-quality mobile GUI data. However, existing mobile GUI datasets are limited in scale, data comprehensiveness, and fidelity. To overcome this, we utilize two mobile SoC clusters to provide over 200 native, high-fidelity mobile environments, along with a VLM-enhanced automatic application traversal framework for highly parallel, automated dataset collection with minimal human intervention. With this system, we propose MobileViews, a million-scale mobile GUI dataset comprising over 1.2 million unique screenshot-view hierarchy pairs from more than 30K modern Android applications. We assess the effectiveness of MobileViews by training four VLMs using the reinforcement learning-based GUI grounding task and evaluating them on two representative GUI grounding benchmarks. Results show that MobileViews significantly enhances grounding accuracy by up to 6.1%. Further analysis of data scale and quality underscores the critical role of large, high-quality datasets as reliable sources for training mobile GUI agents. The MobileViews dataset is publicly available at https://huggingface.co/datasets/mllmTeam/MobileViews.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Human-Computer Interaction

Died the same way โ€” ๐Ÿ’€ 404 Not Found