Home

Awesome

ObjectStitch-Image-Composition

This is an unofficial implementation of the paper "ObjectStitch: Object Compositing with Diffusion Model", CVPR 2023.

Following ObjectStitch, our implementation takes masked foregrounds as input and utilizes both class tokens and patch tokens as conditional embeddings. Since ObjectStitch does not release their training dataset, we train our models on a large-scale public dataset Open-Images.

ObjectStitch is very robust and adept at adjusting the pose/viewpoint of inserted foreground object according to the background. However, the details could be lost or altered for those complex or rare objects.

For better detail preservation and controllability, you can refer to our ControlCom and MureObjectStitch. ControlCom and MureObjectStitch have been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/

Note that in the provided foreground image, the foreground object should occupy the whole foreground image (see our example), otherwise the performance would be severely affected.

Introduction

Our implementation is based on Paint-by-Example, utilizing masked foreground images and employing all class and patch tokens from the foreground image as conditional embeddings. The model is trained using the same hyperparameters as Paint-by-Example. Foreground masks for training images are generated using Segment-Anything.

In total, our model is trained on approximately 1.8 million pairs of foreground and background images from both the train and validation sets of Open-Images. Training occurs over 40 epochs, utilizing 16 A100 GPUs with a batch size of 16 per GPU.

Get Started

1. Dependencies

2. Download Models

3. Inference on examples

4. Create composite images for your dataset

5. Training Code

Notes: certain sensitive information has been removed since the model training was conducted within a company. To start training, you'll need to prepare your own training data and make necessary modifications to the code according to your requirements.

Visualization Results

We showcase several results generated by the released model on FOSCom dataset. In each example, we display the background image with a bounding box (yellow), the foreground image, and 5 randomly sampled images.

<p align='center'> <img src='./results/FOSCom_results.jpg' width=95% /> </p>

We also provide the full results of 640 foreground-background pairs on FOSCom dataset, which can be downloaded from Baidu Cloud (9g1p). Based on the results, you can quickly know the ability and limitation of ObjectStitch.

Other Resources