Home

Awesome

[ไธญๆ–‡ๆ–‡ๆกฃ]

The project is still under construction, we will continue to update it and welcome contributions/pull requests from the community.

<p align="center"><img src="./assets/gvlab_logo.png" width="600"></p> <a src="https://img.shields.io/discord/1099920215724277770?label=Discord&logo=discord" href="https://discord.gg/khWBFnCgAN"> <img src="https://img.shields.io/discord/1099920215724277770?label=Discord&logo=discord"> </a> | <a src="https://img.shields.io/badge/GPU%20Demo-Open-green?logo=alibabacloud" href="https://ichat.opengvlab.com"> <img src="https://img.shields.io/badge/Demo-Open-green?logo=alibabacloud"> </a> | <a src="https://img.shields.io/twitter/follow/opengvlab?style=social" href="https://twitter.com/opengvlab"> <img src="https://img.shields.io/twitter/follow/opengvlab?style=social"> </a>

๐Ÿค–๐Ÿ’ฌ InternGPT [Paper]

<!-- ## Description -->

InternGPT(short for iGPT) / InternChat(short for iChat) is pointing-language-driven visual interactive system, allowing you to interact with ChatGPT by clicking, dragging and drawing using a pointing device. The name InternGPT stands for interaction, nonverbal, and ChatGPT. Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios. Additionally, in iGPT, an auxiliary control mechanism is used to improve the control capability of LLM, and a large vision-language model termed Husky is fine-tuned for high-quality multi-modal dialogue (impressing ChatGPT-3.5-turbo with 93.89% GPT-4 Quality).

๐Ÿค–๐Ÿ’ฌ Online Demo

InternGPT is online (see https://igpt.opengvlab.com). Let's try it!

[NOTE] It is possible that you are waiting in a lengthy queue. You can clone our repo and run it with your private GPU.

<a id="draggan_demo">Video Demo with DragGAN: </a>

https://github.com/OpenGVLab/InternGPT/assets/13723743/529abde4-5dce-48de-bb38-0a0c199bb980

<a id="imagebind_demo">Video Demo with ImageBind: </a>

https://github.com/OpenGVLab/InternGPT/assets/13723743/bacf3e58-6c24-4c0f-8cf7-e0c4b8b3d2af

<a id="igpt_demo">iGPT Video Demo: </a>

https://github.com/OpenGVLab/InternGPT/assets/13723743/8fd9112f-57d9-4871-a369-4e1929aa2593

๐Ÿฅณ ๐Ÿš€ What's New

๐Ÿงญ User Manual

Update:

(2023.05.24) We now support DragGAN. You can try it as follows:

(2023.05.18) We now support ImageBind. If you want to generate a new image conditioned on audio, you can upload an audio file in advance:

<br>

Main features:

After uploading the image, you can have a multi-modal dialogue by sending messages like: "what is it in the image?" or "what is the background color of image?".
You also can interactively operate, edit or generate the image as follows:

๐Ÿ—“๏ธ Schedule

๐Ÿ  System Overview

<p align="center"><img width="800" src="./assets/arch1.png" alt="arch"></p>

๐ŸŽ Major Features

<details> <summary>Remove the masked object</summary> <p align="center"><img src="./assets/demo2.gif" width="500"></p> </details> <details> <summary>Interactive image editing</summary> <p align="center"><img src="./assets/demo3.gif" width="500"></p> </details> <details> <summary>Image generation</summary> <p align="center"><img src="./assets/demo4.gif" width="500"></p> </details> <details> <summary>Interactive visual question answer</summary> <p align="center"><img src="./assets/demo5.gif" width="500"></p> </details> <details> <summary>Interactive image generation</summary> <p align="center"><img src="https://github.com/OpenGVLab/InternGPT/assets/8529570/2b0da08e-af86-453d-99e5-1327f93aa917" width="500"></p> </details> <details> <summary>Video highlight interpretation</summary> <p align="center"><img src="./assets/demo6.jpg" width="500"></p> </details>

๐Ÿ› ๏ธ Installation

See INSTALL.md

๐Ÿ‘จโ€๐Ÿซ <a id="get_started">Get Started </a>

Running the following shell can start a gradio service for our basic features:

python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 -e

if you want to enable the voice assistant, please use openssl to generate the certificate:

mkdir certificate
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes

and then run:

python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" \
--port 3456 --https -e

For all features of our iGPT, you need to run:

python -u app.py \
--load "ImageOCRRecognition_cuda:0,Text2Image_cuda:0,SegmentAnything_cuda:0,ActionRecognition_cuda:0,VideoCaption_cuda:0,DenseCaption_cuda:0,ReplaceMaskedAnything_cuda:0,LDMInpainting_cuda:0,SegText2Image_cuda:0,ScribbleText2Image_cuda:0,Image2Scribble_cuda:0,Image2Canny_cuda:0,CannyText2Image_cuda:0,StyleGAN_cuda:0,Anything2Image_cuda:0,HuskyVQA_cuda:0" \
-p 3456 --https -e

Notice that -e flag can save a lot of memory.

Selectively Loading Features

When you only want to try DragGAN, you just need to load StyleGAN and open the tab "DragGAN":

python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https -e

In this situation, you can only use the functions of DragGAN, which frees you from some dependencies that you are not interested in.

๐ŸŽซ License

This project is released under the Apache 2.0 license.

๐Ÿ–Š๏ธ Citation

If you find this project useful in your research, please consider cite:

@article{2023interngpt,
  title={InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language},
  author={Liu, Zhaoyang and He, Yinan and Wang, Wenhai and Wang, Weiyun and Wang, Yi and Chen, Shoufa and Zhang, Qinglong and Lai, Zeqiang and Yang, Yang and Li, Qingyun and Yu, Jiashuo and others},
  journal={arXiv preprint arXiv:2305.05662},
  year={2023}
}

๐Ÿค Acknowledgement

Thanks to the open source of the following projects:

Hugging Face โ€‚ LangChain โ€‚ TaskMatrix โ€‚ SAM โ€‚ Stable Diffusion โ€‚ ControlNet โ€‚ InstructPix2Pix โ€‚ BLIP โ€‚ Latent Diffusion Models โ€‚ EasyOCRโ€‚ ImageBind โ€‚ DragGAN โ€‚

Welcome to discuss with us and continuously improve the user experience of InternGPT.

If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

<p align="center"><img width="300" alt="image" src="https://github.com/OpenGVLab/DragGAN/assets/26198430/e3f0807f-956a-474e-8fd2-1f7c22d73997"></p>