Awesome

DiffHarmony & DiffHarmony++

The official pytorch implementation of DiffHarmony and DiffHarmony++.

Full Conference Poster of DiffHarmony is here.

Preparation

enviroment

First, prepare a virtual env. You can use conda or anything you like.

python 3.10
pytorch 2.2.0
cuda 12.1
xformers 0.0.24

Then, install requirements.

pip install -r requirements.txt

dataset

Download iHarmony4 dataset from here.

Make sure the structure is just like that:

data/iHarmony4
|- HCOCO
    |- composite_images
    |- masks
    |- real_images
    |- ...
|- HAdobe5k
|- HFlickr
|- Hday2night
|- train.jsonl
|- test.jsonl

The content in train.jsonl fit the following format

{"file_name": "HAdobe5k/composite_images/a0001_1_1.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_2.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_3.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_4.jpg", "text": ""}
...

All file_name are from the original IHD_train.txt. Same way with test.jsonl and IHD_test.txt.

Training

Train diffharmony model

sh scripts/train_diffharmony.sh

Train refinement model

sh scripts/train_refinement_stage.sh

Train condition vae (cvae)

sh scripts/train_cvae.sh

Train diffharmony-gen and cvae-gen

Just add this in your training args:

$script
    ...
    --mode "inverse"

Basically it will use ground truth images as condition instead of composite images.

(optional) online training of condition vae

refer to scripts/train/cvae_online.py

(optional) train cvae with generated data

refer to scripts/train/cvae_with_gen_data.py

Purpose here is trying to improve cvae performance further on specific domain, i.e. our generated dataset.

Inference

Inference iHarmony4 dataset

sh scripts/inference.sh

use diffharmony-gen and cvae-gen to augment HFlickr and Hday2night

sh scripts/inference_generate_data.sh

The all_mask_metadata.jsonl file as its name fits following format:

{"file_name": "masks/f800_1.png", "text": ""}
{"file_name": "masks/f801_1.png", "text": ""}
{"file_name": "masks/f803_1.png", "text": ""}
{"file_name": "masks/f804_1.png", "text": ""}
...

Make HumanHarmony dataset

First, generate some candidate composite images.

Then, use harmony classifier to select the most unharmonious images.

python scripts/misc/classify_cand_gen_data.py

Evaluation

sh scripts/evaluate.sh

Pretrained Models

Baidu, code: aqqd

Google Drive

Citation

If you find this work useful, please consider citing:

@inproceedings{zhou2024diffharmony,
  title={DiffHarmony: Latent Diffusion Model Meets Image Harmonization},
  author={Zhou, Pengfei and Feng, Fangxiang and Wang, Xiaojie},
  booktitle={Proceedings of the 2024 International Conference on Multimedia Retrieval},
  pages={1130--1134},
  year={2024}
}
@inproceedings{zhou2024diffharmonypp,
  title={DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model},
  author={Zhou, Pengfei and Feng, Fangxiang and Liu, Guang and Li, Ruifan and Wang, Xiaojie},
  booktitle={ACM MM},
  year={2024}
}

Contact

If you have any questions, please feel free to contact me via zhoupengfei@bupt.edu.cn .