Awesome

ShowUI

<p align="center"> <img src="assets/showui.jpg" alt="ShowUI" width="480"> <p> <p align="center"> 🤗 <a href="https://huggingface.co/showlab/ShowUI-2B">Hugging Models</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2411.17465">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/spaces/showlab/ShowUI">Spaces Demo</a> &nbsp&nbsp | &nbsp&nbsp 🕹️ <a href="https://openbayes.com/console/public/tutorials/I8euxlahBAm">OpenBayes贝式计算</a> &nbsp&nbsp </a> <br> 🤗 <a href="https://huggingface.co/datasets/showlab/ShowUI-desktop-8K">Datasets</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://x.com/_akhaliq/status/1864387028856537400">X (Twitter)</a>&nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/showlab/computer_use_ootb">Computer Use</a> &nbsp&nbsp </a> | &nbsp&nbsp 📖 <a href="https://github.com/showlab/Awesome-GUI-Agent">GUI Paper List</a> &nbsp&nbsp </a> </p>

ShowUI: One Vision-Language-Action Model for GUI Visual Agent<br> Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou <br>Show Lab @ National University of Singapore, Microsoft<br>

🔥 Update

[2024.12.23] Update showui for UI-guided token selection implementation.
[2024.12.15] ShowUI received Outstanding Paper Award at NeurIPS2024 Open-World Agents workshop.
[2024.12.9] Support int8 Quantization.
[2024.12.5] Major Update: ShowUI is integrated into OOTB for local run!
[2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at HF Spaces demo.
[2024.11.27] We release the arXiv paper, HF Spaces demo and ShowUI-desktop-8K.
[2024.11.16] showlab/ShowUI-2B is available at huggingface.

🖥️ Computer Use

See Computer Use OOTB for using ShowUI to control your PC.

https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee

🚀 Training

Our Training codebases supports:

DeepSpeed Zero1, Zero2, Zero3
Full-tuning (FP32, FP16, BF16), LoRA, QLoRA
SDPA, Flash Attention 2
Multiple datasets mixed training
Interleaved data streaming

See Train for training set up.

🕹️ UI-Guided Token Selection

Try test.ipynb, which seamless support for Qwen2VL models.

⭐ Quick Start

See Quick Start for model usage.

🤗 Local Gradio

See Gradio for installation.

BibTeX

If you find our work helpful, please consider citing our paper.

@misc{lin2024showui,
      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, 
      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
      year={2024},
      eprint={2411.17465},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17465}, 
}