Awesome
<div align="center"> <h1>AUITestAgent: Natural Language-Driven GUI Functional Bug Tester</h1> </div> <div align="center"> <a href='https://arxiv.org/abs/2407.09018'><img src='https://img.shields.io/badge/arxiv-2407.09018-b31b1b.svg'></a> </div> <div align="center"> <a href="https://github.com/Gootter12">Yongxiang Hu<sup>1</sup></a>, <a href="https://github.com/TSKGHS17">Xuan Wang<sup>1</sup></a>, <a href="https://github.com/xieeryihe">Yingchuan Wang<sup>1</sup></a>, <a href="https://github.com/RainPot">Yu Zhang<sup>2</sup></a>, <a href="https://github.com/whiteguo233">Shiyu Guo<sup>2</sup></a>, <a href="https://github.com/chenchaoyi">Chaoyi Chen<sup>2</sup></a>, <a href="https://cs.fudan.edu.cn/3f/7e/c25906a278398/page.htm">Xin Wang<sup>1,3</sup></a> and <a href="https://cs.fudan.edu.cn/3f/a9/c25909a278441/page.htm">Yangfan Zhou<sup>1,3</sup></a> <br><sup>1</sup>School of Computer Science, Fudan University
<sup>2</sup>Meituan, China
<sup>3</sup>Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
🌟 Introduction
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. It takes test requirements written in natural language as input, generates and conducts UI interactions, and verifies whether the UI response aligns with the expectations outlined in the requirements.
To enhance the performance of LLM-based agents in the domain-specific area of UI testing, AUITestAgent decouples GUI interaction and function verification into two separate modules, performing verification after the interaction.
In terms of implementation, AUITestAgent extracts GUI interactions from test requirements using dynamically organized agents to tackle the diversity of requirement expressions. Then, a multi-dimensional data extraction strategy is employed to retrieve data relevant to the test requirements from the interaction trace and perform verification.
📺 Demo
Using AUITestAgent in Meituan
Task: View the rating of the first scenic spot in the scenic view, check whether its rating is consistent
https://github.com/user-attachments/assets/48341d06-bc05-4b71-accd-c8a1c7215834
Using AUITestAgent in Facebook
Task: Send a post with content 'Hello everyone' and like it, check whether it is correctly displayed, and whether the like button turns blue
https://github.com/user-attachments/assets/8c0a33ab-11ab-4f95-b767-678472e8d902
📝 Evaluation
We evaluate AUITestAgent’s performance with two customized benchmark, interaction benchmark and verification benchmark, including 8 widely used commercial apps (i.e., Meituan, Little Reb Book, Douban, Facebook, Gmail, linkedIn, Google play and YouTube Music). To provide a comprehensive assessment, we categorized the difficulty of interaction tasks into three levels: easy (L1), moderate (L2), and difficult (L3). For each level, we constructed ten interaction tasks, with descriptions evenly split between English and Chinese.
Our experiments reveal that AUITestAgent accurately completes 100% tasks at Level 1, 80% of Level 2 tasks, and 50% of Level 3. Additionally, 94% of the interactions generated by AUITestAgent align with the ground truth through manual interactions. These metrics demonstrate that AUITestAgent significantly outperforms existing methods in translating natural language commands to GUI interactions. Moreover, AUITestAgent achieves a recall of 90% for injected GUI functional bugs while maintaining a low false positive rate of just 4.5%. Furthermore, its success in detecting unseen bugs in Meituan underscores the practical advantages of using AUITestAgent for GUI testing in complex commercial apps.
For detail information, please refer to our paper and evalution results.
GUI Interaction
For detail results, please refer to the interaction benchmark.
Baseline:
Function Verification
For detail results, please refer to the verification benchmark.
Since AUITestAgent is the first to focus on natural language driven GUI function verification and there are no existing studies in this field, we constructed a verification method based on multi-turn dialogue using GPT-4o as a baseline.
📚 Citation
If you find this work helpful to your research, please kindly consider citing our paper.
@misc{hu2024auitestagent,
title={AUITestAgent: Automatic Requirements Oriented GUI Function Testing},
author={Yongxiang Hu and Xuan Wang and Yingchuan Wang and Yu Zhang and Shiyu Guo and Chaoyi Chen and Xin Wang and Yangfan Zhou},
year={2024},
eprint={2407.09018},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
🧑 Team introduction
AUITestAgent is joint work from Prof. Zhou’s team at Fudan University and the Meituan In-Store R&D platform. We have long been dedicated to the field of AI for full-stack front-end technology. In addition to AUITestAgent, we have developed several other technological innovations, including vision-ui, Appaction and AutoConsis.