Home

Awesome

GUI Odyssey

This repository is the official implementation of GUI Odyssey.

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao✉️⭐️, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo✉️
✉️ Wenqi Shao (shaowenqi@pjlab.org.cn) and Ping Luo (pluo@cs.hku.hk) are correponding authors.
⭐️ Wenqi Shao is project leader.

💡 News

<!-- And check our [project page]()! -->

🔆 Introduction

GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. overview

🛠️ Data collection pipeline

GUI Odyssey comprises six categories of navigation tasks. For each category, we construct instruction templates with items and apps selected from a predefined pool, resulting in a vast array of unique instructions for annotating GUI episodes. Human demonstrations on an Android emulator capture the metadata of each episode in a comprehensive format. After rigorous quality checks, GUI Odyssey includes 7,735 validated cross-app GUI navigation episodes. pipeline

📝 Statistics

<center>
Splits# Episodes# Unique Prompts# Avg. StepsData locationModel
Total7,7357,73515.4GUI-OdysseyOdysseyAgent
Train-Random & Test-Random5,802 / 1,9335,802 / 1,93315.4 / 15.2random_split.jsonOdysseyAgent-Random
Train-Task & Test-Task6,719 / 1,0166,719 / 1,01615.0 / 17.6task_split.jsonOdysseyAgent-Task
Train-Device & Test-Device6,473 / 1,2626,473 / 1,26215.4 / 15.0device_split.jsonOdysseyAgent-Device
Train-App & Test-App6,596 / 1,1396,596 / 1,13915.4 / 15.3app_split.jsonOdysseyAgent-App
</center>

💫 Dataset Access

The whole GUI Odyssey is hosted on Huggingface.

Clone the entire dataset from Huggingface:

git clone https://huggingface.co/datasets/OpenGVLab/GUI-Odyssey

And then move the cloned dataset into ./data directory. After that, the structure of ./data should look like this:

GUI-Odyssey
├── data
│   ├── annotations
│   │   └── *.json
│   ├── screenshots
│   │   └── data_*
│   │        └── *.png
│   ├── splits
│   │   ├── app_split.json
│   │   ├── device_split.json
│   │   ├── random_split.json
│   │   └── task_split.json
│   ├── format_converter.py
│   └── preprocessing.py
└── ...

Then organize the screenshots folder:

cd data
python preprocessing.py

Finally, the structure of ./data should look like this:

GUI-Odyssey
├── data
│   ├── annotations
│   │   └── *.json
│   ├── screenshots
│   │   └── *.png
│   ├── splits
│   │   ├── app_split.json
│   │   ├── device_split.json
│   │   ├── random_split.json
│   │   └── task_split.json
│   ├── format_converter.py
│   └── preprocessing.py
└── ...

⚙️ Detailed Data Information

Please refer to this.

🚀 Quick Start

Please refer to this to quick start.

📖 Release Process

🖊️ Citation

If you feel GUI Odyssey useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!

@misc{lu2024gui,
      title={GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices}, 
      author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo},
      year={2024},
      eprint={2406.08451},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
<!-- ## 📢 Disclaimer We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes. -->