Awesome
<p align="center" width="100%"> <a href="https://github.com/Cheems-Seminar/grounded-segment-any-parts" target="_blank"><img src="assets/logo.png" alt="Cheems Seminar" style="width: 70%; min-width: 300px; display: block; margin: auto;"></a> </p>Grounded Segment Anything: From Objects to Parts
In this repo, we expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level:full_moon: (eg, dog) and part-level:last_quarter_moon: (eg, dog head). Furthermore,we build a Visual ChatGPT-based dialogue system :robot::speech_balloon: that flexibly calls various segmentation models when receiving instructions in the form of natural language.
[Blog] [Chinese Blog]
News
- 2023/04/14: Edit anything at more fine-grained part-level.
- 2023/04/11: Initial code release.
:rocket:New:rocket: Edit on Part-Level
Part Prompt: "dog body"; Edit Prompt: "zebra" Part Prompt: "cat head"; Edit Prompt: "tiger" Part Prompt: "chair seat"; Edit Prompt: "cholocate" Part Prompt: "person head"; Edit Prompt: "combover hairstyle"
:sparkles::sparkles: Highlights :sparkles::sparkles:
Beyond class-agnostic mask segmentation, this repo contains:
- Grounded segment anything at both object level and part level.
- Interacting with models in the form of natural language.
These abilities come from a series of models, including:
Model | Function |
---|---|
Segment Anything | Segment anything from prompt |
GLIP | Grounded language-image pre-training |
Visual ChatGPT | Connects ChatGPT and segmentation foundation models |
:star:VLPart:star: | Going denser with open-vocabulary part segmentation |
FAQ
Q: When will VLPart paper be released ?
A: VLPart paper has been released. :rocket::rocket::rocket:
Q: What is the difference between Grounded SAM and this project ?
A: Grounded SAM is Grounded DINO + SAM, and this project is GLIP/VLPart + SAM. We believe any open-vocabulary (text prompt) object detection model can be used to combine with SAM.
Usage
Install
See installation instructions.
Edit
python demo_part_edit.py
:robot::speech_balloon: Integration with Visual ChatGPT
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
python chatbot.py --load "ImageCaptioning_cuda:0, SegmentAnything_cuda:1, PartPromptSegmentAnything_cuda:1, ObjectPromptSegmentAnything_cuda:0"
<img src="./assets/demo_chat_short.gif" width="600">
:last_quarter_moon: Prompt Segment Anything at Part Level
wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
python demo_vlpart_sam.py --input_image assets/twodogs.jpeg --output_dir outputs_demo --text_prompt "dog head"
Result:
<img src="./assets/vlpart_sam_output_twodogs.jpeg" width="600">:full_moon: Prompt Segment Anything at Object Level
wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/glip_large.pth
python demo_glip_sam.py --input_image assets/demo2.jpeg --output_dir outputs_demo --text_prompt "frog"
Result:
<img src="./assets/glip_sam_output_demo2.jpeg" width="600">:lollipop: Multi-Prompt
For multiple prompts, seperate each prompt with .
, for example, --text_prompt "dog head. dog nose"
Model Checkpoints
License
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
Acknowledgement
A large part of the code is borrowed from segment-anything, EditAnything, CLIP, GLIP, Grounded-Segment-Anything, Visual ChatGPT. Many thanks for their wonderful works.
Citation
If you find this project helpful for your research, please consider citing the following BibTeX entry.
@misc{segrec2023,
title = {Grounded Segment Anything: From Objects to Parts},
author = {Sun, Peize and Chen, Shoufa and Luo, Ping},
howpublished = {\url{https://github.com/Cheems-Seminar/grounded-segment-any-parts}},
year = {2023}
}
@article{vlpart2023,
title = {Going Denser with Open-Vocabulary Part Segmentation},
author = {Sun, Peize and Chen, Shoufa and Zhu, Chenchen and Xiao, Fanyi and Luo, Ping and Xie, Saining and Yan, Zhicheng},
journal = {arXiv preprint arXiv:2305.11173},
year = {2023}
}