Home

Awesome

Awesome Generative Image Composition Awesome

A curated list of resources including papers, datasets, and relevant links pertaining to generative image composition (object insertion). Generative image composition aims to generate plausible composite images based on a background image (optional bounding box) and a (resp., a few) foreground image (resp., images) of a specific object. For more complete resources on general image composition, please refer to Awesome-Image-Composition.

<p align='center'> <img src='./figures/task.jpg' width=90% /> </p>

Contributing

Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.

Table of Contents

Survey

A brief review on generative image composition is included in the following survey on image composition:

Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv preprint arXiv:2106.14490 (2021). [arXiv] [slides]

Online Demo

Try this online demo for generative image composition and have fun! hot

Evaluation Metrics

Test Set

Leaderboard

The training set is open. The test set is COCOEE benchmark. Partial results are copied from ControlCom. Honestly speaking, the following evaluation metrics are not very reliable. For more comprehensive and interpretable evaluation, you can refer to this summary of evaluation metrics.

<table class="tg"> <tr> <th class="tg-0pky" rowspan="2" align="center">Method</th> <th class="tg-0pky" colspan="3" align="center">Foreground</th> <th class="tg-0pky" colspan="2" align="center">Background</th> <th class="tg-0pky" colspan="2" align="center">Overall</th> </tr> <tr> <th class="tg-0pky" align="center">CLIP&uarr;</th> <th class="tg-0pky" align="center">DINO&uarr;</th> <th class="tg-0pky" align="center">FID&darr;</th> <th class="tg-0pky" align="center">LSSIM&uarr;</th> <th class="tg-0pky" align="center">LPIPS&darr;</th> <th class="tg-0pky" align="center">FID&darr;</th> <th class="tg-0pky" align="center">QS&uarr;</th> </tr> <tr> <th class="tg-0pky" align="center">Inpaint&Paste</th> <th class="tg-0pky" align="center">-</th> <th class="tg-0pky" align="center">-</th> <th class="tg-0pky" align="center">8.0</th> <th class="tg-0pky" align="center">-</th> <th class="tg-0pky" align="center">-</th> <th class="tg-0pky" align="center">3.64</th> <th class="tg-0pky" align="center">72.07</th> </tr> <th class="tg-0pky" align="center"><a href="https://arxiv.org/pdf/2211.13227.pdf">PBE</a> </th> <th class="tg-0pky" align="center">84.84</th> <th class="tg-0pky" align="center">52.52</th> <th class="tg-0pky" align="center">6.24</th> <th class="tg-0pky" align="center">0.823</th> <th class="tg-0pky" align="center">0.116</th> <th class="tg-0pky" align="center">3.18</th> <th class="tg-0pky" align="center">77.80</th> </tr> <th class="tg-0pky" align="center"><a href="https://arxiv.org/pdf/2212.00932.pdf">ObjectStitch</a></th> <th class="tg-0pky" align="center">85.97</th> <th class="tg-0pky" align="center">61.12</th> <th class="tg-0pky" align="center">6.86</th> <th class="tg-0pky" align="center">0.825</th> <th class="tg-0pky" align="center">0.116</th> <th class="tg-0pky" align="center">3.35</th> <th class="tg-0pky" align="center">76.86</th> </tr> <th class="tg-0pky" align="center"><a href="https://arxiv.org/pdf/2307.09481.pdf">AnyDoor</a></th> <th class="tg-0pky" align="center">89.7</th> <th class="tg-0pky" align="center">70.16</th> <th class="tg-0pky" align="center">10.5</th> <th class="tg-0pky" align="center">0.870</th> <th class="tg-0pky" align="center">0.109</th> <th class="tg-0pky" align="center">3.60</th> <th class="tg-0pky" align="center">76.18</th> </tr> <th class="tg-0pky" align="center"><a href="https://arxiv.org/pdf/2308.10040.pdf">ControlCom</a></th> <th class="tg-0pky" align="center">88.31</th> <th class="tg-0pky" align="center">63.67</th> <th class="tg-0pky" align="center">6.28</th> <th class="tg-0pky" align="center">0.826</th> <th class="tg-0pky" align="center">0.114</th> <th class="tg-0pky" align="center">3.19</th> <th class="tg-0pky" align="center">77.84</th> </tr> </table>

Evaluating Your Results

  1. Install Dependencies:

  2. Clone Repository and Download Pretrained Models:

    • Clone this repository and ensure you have a checkpoints folder.
    • Download the following pretrained models into the checkpoints folder:

    The resulting folder structure should resemble the following:

    checkpoints/
    ├── clip-vit-base-patch32
    ├── coco2017_gmm_k20
    ├── dino-vits16
    └── sam_vit_h_4b8939.pth
    
<!-- 3. **Download Cache File for FID Scores**: - Download the cache file from [Google Drive](https://drive.google.com/file/d/1m5EXLb2fX95uyl2dYtQUudjnFsGhN5dU/view?usp=sharing) used for computing FID scores. - Unzip the cache file to a `cache` folder as follows: ```shell cache/ ├── coco2017_test.pth └── cocoee_gtfg.pth ``` Alternatively, you can download the test set of [COCO2017](http://images.cocodataset.org/zips/test2017.zip) in advance and unzip it to a `data` folder. -->
  1. Prepare COCOEE Benchmark and Your Results:
    • Prepare the COCOEE benchmark alongside your generated composite results. Ensure that your composite images have filenames corresponding to the background images of the COCOEE dataset, as illustrated below:
      results/
      ......
      ├── 000002228519_GT.png
      ├── 000002231413_GT.png
      ├── 900100065455_GT.png
      └── 900100376112_GT.png
      
    • Modify the paths accordingly in the run.sh file. If you have downloaded the cache file mentioned earlier, please ignore cocodir.
    • Execute the following command:
      sh run.sh
      
    Then, wait for the results of all metrics to be computed.

Papers

(Object+Text)-to-Object

Object-to-Object

Token-to-Object

Related Topics

Foreground: 3D; Background: image

Foreground: 3D; Background: 3D

Foreground: video; Background: image

Foreground: video; Background: video

Other Resources