Home

Awesome

ControlCom-Image-Composition

This is the official repository for the following research paper:

ControlCom: Controllable Image Composition using Diffusion Model [arXiv]<br>

Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu<br>

Part of our ControlCom has been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/

Table of Contents

Demo

The online demo of image composition can be found here.

Task Definition

In our controllable image composition model, we unify four tasks in one model using an 2-dim binary indicator vector, in which the first (resp., second) dimension represents whether adjusting the foreground illumination (resp., pose) to be compatible with background. 1 means making adjustment and 0 means remaining the same. Therefore, (0,0) corresponds to image blending, (1,0) corresponds to image harmonization, (0,1) corresponds to view synthesis, (1,1) corresponds to generative composition.

<p align='center'> <img src='./figures/task.png' width=70% /> </p>

Our method can selectively adjust partial foreground attributes. Previous methods may adjust the foreground color/pose unexpectedly and even unreasonably, even when the foreground illumination and pose are already compatible with the background. In the left part, the foreground pose is already compatible with background and previous methods make unnecessary adjustment. In the right part, the foreground illumination is already compatible with the background and previous methods adjust the foreground color in an undesirable manner.

<p align='center'> <img src='./figures/controllability_necessity.jpg' width=90% /> </p>

The (0,0), (1,0) versions without changing foreground pose are very robust and generally well-behaved, but some tiny details may be lost or altered. The (0,1), (1,1) versions changing foreground pose are less robust and may produce the results with distorted structures or noticeable artifacts. For foreground pose variation, we recommend more robust ObjectStitch.

Network Architecture

Our method is built upon stable diffusion and the network architecture is shown as follows.

<p align='center'> <img src='./figures/architecture.png' width=90% /> </p>

FOSCom Dataset

Code and Model

1. Dependencies

2. Download Models

3. Inference on examples

These images under examples folder are obtained from COCOEE dataset.

4. Inference on your data

5. Training code

Notes: certain sensitive information has been removed since the model training was conducted within a company. To start training, you'll need to prepare your own training data and make necessary modifications to the code according to your requirements.

Experiments

We show our results using four types of indicators.

<p align='center'> <img src='./figures/controllable_results.jpg' width=80% /> </p>

Evaluation

The quantitative results and evaluation code can be found here.

Acknowledgements

This code borrows heavily from Paint-By-Example. We also appreciate the contributions of Stable Diffusion.

Citation

If you find this work or code is helpful in your research, please cite:

@article{zhang2023controlcom,
  title={Controlcom: Controllable image composition using diffusion model},
  author={Zhang, Bo and Duan, Yuxuan and Lan, Jun and Hong, Yan and Zhu, Huijia and Wang, Weiqiang and Niu, Li},
  journal={arXiv preprint arXiv:2308.10040},
  year={2023}
}

Other Resources