Home

Awesome

Label-Anything-Pipeline

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration
Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, and Yueting Zhuang

Zhejiang Univerisity

This project is under construction and we will have all the code ready soon.

GPT-4 can do anything even in visual tasks——Label anything and Generate anything just all in one pipeline.

Make it easier for users to turn their ideas into accurate images. Generate whatever you think! (a small baby of DALLE 3)

NEWs

We release our technical report(<span style="color:red">🔥NEW</span>)

image

We train the ChatGPT with low cost and can generate semantically rich prompts for AIGC models creating fantastic images. Even given short words (room), our pipeline imagines vivid scene descriptions and generates the most matched fine-grained images. Alt text

Concept / Idea WordsChatGPT Prompt TemplateAIGC Generated ImageVLM Generated CaptionsVFM Automantic Annotations
Nordic-style decoration roomI want to use artificial intelligence to synthesize the {Nordic-style decoration room}. Please describe the features of the {Nordic-style decoration room} briefly in Englishimagea rendering of a living room with a couch, table, chairs, and a window.image

Automatic Prompts for AIGC models:

We teach ChatGPT as an assistant to help us imagine various scenes with different backgrounds based on the simple sentence 'A white dog sits on wooden bench.' and generate much data for down-stream tasks by the help of AIGC models.(<span style="color:red">🔥NEW</span>)

Scene BackgroundObject Label WordsHigh-quality DescriptionGenerated Image with Complex Scenes
'city street'['buildings', 'sidewalk', 'streetlights', 'cars', 'trash cans']'A dog sits on a wooden bench on a bustling city street, surrounded by towering buildings and a busy sidewalk. Streetlights illuminate the scene as cars whiz by, and a few trash cans sit nearby. Despite the urban chaos, the dog seems content to watch the world go by.'seed237_rich
'park'['trees', 'grass', 'flowers', 'pond', 'picnic table']'A friendly dog sits on a wooden bench in a peaceful park, surrounded by tall trees and lush green grass. Colorful flowers bloom nearby, and a tranquil pond glistens in the distance. A nearby picnic table invites visitors to relax and enjoy the serene surroundings.'seed566_rich
'beach'['ocean', 'sand', 'umbrella', 'seashells', 'waves']'A dog sits on a wooden bench on a sunny beach, surrounded by soft sand and sparkling blue ocean. A colorful umbrella provides shade, and a few seashells are scattered nearby. The gentle sound of waves lapping at the shore creates a soothing soundtrack for the idyllic scene.'seed92_rich

Using stable diffusion to generate and annotate bounding boxes and masks for object detection and segmentation just in one-pipeline!

LLM is a data specialist based on AIGC models.

  1. ChatGPT acts as an educator to guide AIGC models to generate a variety of controllable images in various scenarios
  2. Generally, given a raw image from the website or AIGC, SAM generated the masked region for the source image and GroundingDINO generated the open-set detection results just in one step. Then, we filter overlap bounding boxes and obtain non-ambiguity annotations.
  3. Mixture text prompt and clip model to select the region by similarity scores, which can be finally used to generate the target edited image with stable-diffusion-inpaint pipeline.

Features

Run Demos

# Segment Anything
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# GroundingDINO
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
# blended model for foreobjects editing
mkdir -p blended_latent_diffusion/models/ldm/text2img-large/
wget -O blended_latent_diffusion/models/ldm/text2img-large/blend_model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt

bash annotation.sh
bash conditional_edit.sh

Generated Cases

Fantastic Control-Generation by ChatGPT

image

person, beach, surfboard

A couple enjoys a relaxing day at the beach with the man walking together with the woman, holding a big surfboard. The serene scene is complete with the sound of waves and the warm sun and there are many people lying on the beach.

image

:bookmark_tabs: Catelog

Reference

[1] https://chat.openai.com/

[2] https://github.com/huggingface/diffusers

[3] https://github.com/facebookresearch/segment-anything

[4] https://github.com/IDEA-Research/Grounded-Segment-Anything/

📜 Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@misc{yu2023interactive,
      title={Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration}, 
      author={Qifan Yu and Juncheng Li and Wentao Ye and Siliang Tang and Yueting Zhuang},
      year={2023},
      eprint={2305.12799},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}