Awesome
ShowUI
<p align="center"> <img src="assets/showui.jpg" alt="ShowUI" width="480"> <p> <p align="center"> 🤗 <a href="https://huggingface.co/showlab/ShowUI-2B">Hugging Models</a>   |    📑 <a href="https://arxiv.org/abs/2411.17465">Paper</a>    |    🤗 <a href="https://huggingface.co/spaces/showlab/ShowUI">Spaces Demo</a>    |    🕹️ <a href="https://openbayes.com/console/public/tutorials/I8euxlahBAm">OpenBayes贝式计算</a>    </a> <br> 🤗 <a href="https://huggingface.co/datasets/showlab/ShowUI-desktop-8K">Datasets</a>   |   💬 <a href="https://x.com/_akhaliq/status/1864387028856537400">X (Twitter)</a>   |    🖥️ <a href="https://github.com/showlab/computer_use_ootb">Computer Use</a>    </a> |    📖 <a href="https://github.com/showlab/Awesome-GUI-Agent">GUI Paper List</a>    </a> </p> <!-- [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fshowlab%2FShowUI&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https://hits.seeyoufarm.com) -->ShowUI: One Vision-Language-Action Model for GUI Visual Agent<br> Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou <br>Show Lab @ National University of Singapore, Microsoft<br>
🔥 Update
- [2024.12.23] Update
showui
for UI-guided token selection implementation. - [2024.12.15] ShowUI received Outstanding Paper Award at NeurIPS2024 Open-World Agents workshop.
- [2024.12.9] Support int8 Quantization.
- [2024.12.5] Major Update: ShowUI is integrated into OOTB for local run!
- [2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at HF Spaces demo.
- [2024.11.27] We release the arXiv paper, HF Spaces demo and
ShowUI-desktop-8K
. - [2024.11.16]
showlab/ShowUI-2B
is available at huggingface.
🖥️ Computer Use
See Computer Use OOTB for using ShowUI to control your PC.
https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee
🚀 Training
Our Training codebases supports:
- DeepSpeed Zero1, Zero2, Zero3
- Full-tuning (FP32, FP16, BF16), LoRA, QLoRA
- SDPA, Flash Attention 2
- Multiple datasets mixed training
- Interleaved data streaming
See Train for training set up.
🕹️ UI-Guided Token Selection
Try test.ipynb
, which seamless support for Qwen2VL models.
⭐ Quick Start
See Quick Start for model usage.
🤗 Local Gradio
See Gradio for installation.
BibTeX
If you find our work helpful, please consider citing our paper.
@misc{lin2024showui,
title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
year={2024},
eprint={2411.17465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17465},
}