Awesome
DiffHarmony & DiffHarmony++
The official pytorch implementation of DiffHarmony and DiffHarmony++.
Full Conference Poster of DiffHarmony is here.
Preparation
enviroment
First, prepare a virtual env. You can use conda or anything you like.
python 3.10
pytorch 2.2.0
cuda 12.1
xformers 0.0.24
Then, install requirements.
pip install -r requirements.txt
dataset
Download iHarmony4 dataset from here.
Make sure the structure is just like that:
data/iHarmony4
|- HCOCO
|- composite_images
|- masks
|- real_images
|- ...
|- HAdobe5k
|- HFlickr
|- Hday2night
|- train.jsonl
|- test.jsonl
The content in train.jsonl
fit the following format
{"file_name": "HAdobe5k/composite_images/a0001_1_1.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_2.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_3.jpg", "text": ""}
{"file_name": "HAdobe5k/composite_images/a0001_1_4.jpg", "text": ""}
...
All file_name
are from the original IHD_train.txt
. Same way with test.jsonl
and IHD_test.txt
.
Training
Train diffharmony model
sh scripts/train_diffharmony.sh
Train refinement model
sh scripts/train_refinement_stage.sh
Train condition vae (cvae)
sh scripts/train_cvae.sh
Train diffharmony-gen and cvae-gen
Just add this in your training args:
$script
...
--mode "inverse"
Basically it will use ground truth images as condition instead of composite images.
(optional) online training of condition vae
refer to scripts/train/cvae_online.py
(optional) train cvae with generated data
refer to scripts/train/cvae_with_gen_data.py
Purpose here is trying to improve cvae performance further on specific domain, i.e. our generated dataset.
Inference
Inference iHarmony4 dataset
sh scripts/inference.sh
use diffharmony-gen and cvae-gen to augment HFlickr and Hday2night
sh scripts/inference_generate_data.sh
The all_mask_metadata.jsonl
file as its name fits following format:
{"file_name": "masks/f800_1.png", "text": ""}
{"file_name": "masks/f801_1.png", "text": ""}
{"file_name": "masks/f803_1.png", "text": ""}
{"file_name": "masks/f804_1.png", "text": ""}
...
Make HumanHarmony dataset
First, generate some candidate composite images.
Then, use harmony classifier to select the most unharmonious images.
python scripts/misc/classify_cand_gen_data.py
Evaluation
sh scripts/evaluate.sh
Pretrained Models
Baidu, code: aqqd
Citation
If you find this work useful, please consider citing:
@inproceedings{zhou2024diffharmony,
title={DiffHarmony: Latent Diffusion Model Meets Image Harmonization},
author={Zhou, Pengfei and Feng, Fangxiang and Wang, Xiaojie},
booktitle={Proceedings of the 2024 International Conference on Multimedia Retrieval},
pages={1130--1134},
year={2024}
}
@inproceedings{zhou2024diffharmonypp,
title={DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model},
author={Zhou, Pengfei and Feng, Fangxiang and Liu, Guang and Li, Ruifan and Wang, Xiaojie},
booktitle={ACM MM},
year={2024}
}
Contact
If you have any questions, please feel free to contact me via zhoupengfei@bupt.edu.cn
.