Home

Awesome

Person Image Synthesis via Denoising Diffusion Model Open in Colab

<p align='center'> <b> <a href="https://arxiv.org/abs/2211.12500">ArXiv</a> | <a href="https://ankanbhunia.github.io/PIDM">Project</a> | <a href="https://colab.research.google.com/github/ankanbhunia/PIDM/blob/main/PIDM_demo.ipynb">Demo</a> | <a href="https://www.youtube.com/watch?v=cHdZTZurX8M">Youtube</a> </b> </p> <p align="center"> <img src=Figures/images.gif>

News

Generated Results

<img src="https://raw.githubusercontent.com/ankanbhunia/PIDM/main/Figures/intro_fig.jpg">

You can directly download our test results from Google Drive: (1) PIDM.zip (2) PIDM_vs_Others.zip

The PIDM_vs_Others.zip file compares our method with several state-of-the-art methods e.g. ADGAN [14], PISE [24], GFLA [20], DPTN [25], CASD [29], NTED [19]. Each row contains target_pose, source_image, ground_truth, ADGAN, PISE, GFLA, DPTN, CASD, NTED, and PIDM (ours) respectively.

Dataset

<!-- ```bash cd scripts ./download_dataset.sh ``` Or you can download these files manually: -->

Custom Dataset

The folder structure of any custom dataset should be as follows:

You basically will have all your images inside img folder. You can use different subfolders to store your images or put all your images inside the img folder as well. The corresponding poses are stored inside pose folder (as txt file if you use openpose. In our project, we use 18-point keypoint estimation). train_pairs.txt and test_pairs.txt will have paths of all possible pairs seperated by comma <src_path1>,<tgt_path1>.

After that, run the following command to process the data:

python data/prepare_data.py \
--root ./dataset/<dataset_name> \
--out ./dataset/<dataset_name>
--sizes ((256,256),)

This will create an lmdb dataset ./dataset/<dataset_name>/256-256/

Conda Installation

# 1. Create a conda virtual environment.
conda create -n PIDM python=3.7
conda activate PIDM
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# 2. Clone the Repo and Install dependencies
git clone https://github.com/ankanbhunia/PIDM
pip install -r requirements.txt

Method

<img src=Figures/main.png>

Training

This code supports multi-GPU training. Full training takes 5 days with 8 A100 GPUs and a batch size 8 on the DeepFashion dataset. The model is trained for 300 epochs; however, it generates high-quality usable samples after 200 epochs. We also attempted training with V100 GPUs, and our code takes a similar amount of time for training.

python -m torch.distributed.launch --nproc_per_node=8 --master_port 48949 train.py \
--dataset_path "./dataset/deepfashion" --batch_size 8 --exp_name "pidm_deepfashion"

Inference

Download the pretrained model from here and place it in the checkpoints folder. For pose control use obj.predict_pose as in the following code snippets.

from predict import Predictor
obj = Predictor()

obj.predict_pose(image=<PATH_OF_SOURCE_IMAGE>, sample_algorithm='ddim', num_poses=4, nsteps=50)

For apperance control use obj.predict_appearance

from predict import Predictor
obj = Predictor()

src = <PATH_OF_SOURCE_IMAGE>
ref_img = <PATH_OF_REF_IMAGE>
ref_mask = <PATH_OF_REF_MASK>
ref_pose = <PATH_OF_REF_POSE>

obj.predict_appearance(image=src, ref_img = ref_img, ref_mask = ref_mask, ref_pose = ref_pose, sample_algorithm = 'ddim',  nsteps = 50)

The output will be saved as output.png filename.

Citation

If you use the results and code for your research, please cite our paper:

@article{bhunia2022pidm,
  title={Person Image Synthesis via Denoising Diffusion Model},
  author={Bhunia, Ankan Kumar and Khan, Salman and Cholakkal, Hisham and Anwer, Rao Muhammad and Laaksonen, Jorma and Shah, Mubarak and Khan, Fahad Shahbaz},
  journal={CVPR},
  year={2023}
}

Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Anwer, Jorma Laaksonen, Mubarak Shah & Fahad Khan