Home

Awesome

<h2 align="center"> <a href="https://computer-use-ootb.github.io"> <img src="./assets/ootb_logo.png" alt="Logo" style="display: block; margin: 0 auto; filter: invert(1) brightness(2);"> </a> </h2> <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for the latest update.</h5> <h5 align=center>

arXiv Project Page Hits

</h5>

<img src="./assets/ootb_icon.png" alt="Star" style="height:25px; vertical-align:middle; filter: invert(1) brightness(2);"> Overview

Computer Use <span style="color:rgb(106, 158, 210)">O</span><span style="color:rgb(111, 163, 82)">O</span><span style="color:rgb(209, 100, 94)">T</span><span style="color:rgb(238, 171, 106)">B</span><img src="./assets/ootb_icon.png" alt="Star" style="height:20px; vertical-align:middle; filter: invert(1) brightness(2);"> is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (Claude 3.5 Computer Use) and locally-running models (<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span>UI).

No Docker is required, and it supports both Windows and macOS. This project provides a user-friendly interface based on Gradio. 🎨

For more information, you can visit our study on Claude 3.5 Computer Use [project page]. 🌐

Update

Demo Video

https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee

<div style="display: flex; justify-content: space-around;"> <a href="https://youtu.be/Ychd-t24HZw" target="_blank" style="margin-right: 10px;"> <img src="https://img.youtube.com/vi/Ychd-t24HZw/maxresdefault.jpg" alt="Watch the video" width="48%"> </a> <a href="https://youtu.be/cvgPBazxLFM" target="_blank"> <img src="https://img.youtube.com/vi/cvgPBazxLFM/maxresdefault.jpg" alt="Watch the video" width="48%"> </a> </div>

πŸš€ Getting Started

0. Prerequisites

1. Clone the Repository πŸ“‚

Open the Conda Terminal. (After installation Of Miniconda, it will appear in the Start menu.) Run the following command on Conda Terminal.

git clone https://github.com/showlab/computer_use_ootb.git
cd computer_use_ootb

2.1 Install Dependencies πŸ”§

pip install -r dev-requirements.txt

2.2 (Optional) Get Prepared for <span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span>UI Local-Run

  1. Download all files of the ShowUI-2B model via the following command. Ensure the ShowUI-2B folder is under the computer_use_ootb folder.

    python install_tools/install_showui.py
    
  2. Make sure to install the correct GPU version of PyTorch (CUDA, MPS, etc.) on your machine. See install guide and verification.

  3. Get API Keys for GPT-4o or Qwen-VL. For mainland China users, Qwen API free trial for first 1 mil tokens is available.

3. Start the Interface ▢️

Start the OOTB interface:

python app.py

If you successfully start the interface, you will see two URLs in the terminal:

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://xxxxxxxxxxxxxxxx.gradio.live (Do not share this link with others, or they will be able to control your computer.)

<u>For convenience</u>, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don’t need to manually pass the keys each run. On Windows Powershell (via the set command if on cmd):

$env:ANTHROPIC_API_KEY="sk-xxxxx" (Replace with your own key)
$env:QWEN_API_KEY="sk-xxxxx"
$env:OPENAI_API_KEY="sk-xxxxx"

On macOS/Linux, replace $env:ANTHROPIC_API_KEY with export ANTHROPIC_API_KEY in the above command.

4. Control Your Computer with Any Device can Access the Internet

Open the website at http://localhost:7860/ (if you're controlling the computer itself) or https://xxxxxxxxxxxxxxxxx.gradio.live in your mobile browser for remote control.

Enter the Anthropic API key (you can obtain it through this website), then give commands to let the AI perform your tasks.

ShowUI Advanced Settings

We provide a 4-bit quantized ShowUI-2B model for cost-efficient inference (currently only support CUDA devices). To download the 4-bit quantized ShowUI-2B model:

python install_tools/install_showui-awq-4bit.py

Then, enable the quantized setting in the 'ShowUI Advanced Settings' dropdown menu.

Besides, we also provide a slider to quickly adjust the max_pixel parameter in the ShowUI model. This controls the visual input size of the model and greatly affects the memory and inference speed.

πŸ–₯️ Supported Systems

πŸ‘“ OOTB Iterface

<div style="display: flex; align-items: center; gap: 10px;"> <figure style="text-align: center;"> <img src="./assets/gradio_interface.png" alt="Desktop Interface" style="width: auto; object-fit: contain;"> </figure> </div>

⚠️ Risks

πŸ“… Roadmap

Join Discussion

Welcome to discuss with us and continuously improve the user experience of Computer Use - OOTB. Reach us using this Discord Channel or the WeChat QR code below!

<div style="display: flex; flex-direction: row; justify-content: space-around;"> <!-- <img src="./assets/wechat_2.jpg" alt="gradio_interface" width="30%"> --> <img src="./assets/wechat_3.jpg" alt="gradio_interface" width="30%"> </div> <div style="height: 30px;"></div> <hr> <a href="https://computer-use-ootb.github.io"> <img src="./assets/ootb_logo.png" alt="Logo" width="30%" style="display: block; margin: 0 auto; filter: invert(1) brightness(2);"> </a>