Home

Awesome

<div align="center"> <h1>FreeDoM 🕊️ (ICCV 2023)</h1> <h3>FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model</h3>

Jiwen Yu<sup>1</sup>, Yinhuai Wang<sup>1</sup>, Chen Zhao<sup>2</sup>, Bernard Ghanem<sup>2</sup>, Jian Zhang<sup>1</sup>

<sup>1</sup> Peking University, <sup>2</sup> KAUST

arXiv Camera Ready Paper Camera Ready Paper

</div>

News

Todo

Introduction

FreeDoM is a simple but effective training-free method generating results under control from various conditions using unconditional diffusion models. Specifically, we use off-the-shelf pre-trained networks to construct the time-independent energy function, which measures the distance between the given conditions and the intermediately generated images. Then we compute the energy gradient and use it to guide the generation process. FreeDoM supports various conditions, including texts, segmentation maps, sketches, landmarks, face IDs, and style images. FreeDoM applies to different data domains, including human faces, images from ImageNet, and latent codes.

Overall Experimental Configurations

Model SourceData DomainResolutionOriginal ConditionsAdditional Training-free ConditionsSampling Time*(s/image)
SDEditaligned human face$256\times256$Noneparsing maps, sketches, landmarks, face IDs, texts≈20s
guided-diffusionImageNet$256\times256$Nonetexts, style images≈140s
guided-diffusionImageNet$256\times256$class labelstyle images≈50s
Stable Diffusiongeneral images$512\times512$(standard)textsstyle images≈84s
ControlNetgeneral images$512\times512$(standard)human poses, scribbles, textsface IDs, style images≈120s

*The sampling time is tested on a GeForce RTX 3090 GPU card.

Results

<details> <summary>Training-free <strong>style</strong> guidance + <strong>Stable Diffusion</strong> (click to expand) </summary> <img src = "./figure/SD_style.png" width=6000> </details> <details> <summary>Training-free <strong>style</strong> guidance + Scribble <strong>ControlNet</strong> (click to expand)</summary> <img src="./figure/CN_style.png" width=2000> </details> <details> <summary>Training-free <strong>face ID</strong> guidance + Human-pose <strong>ControlNet</strong> (click to expand)</summary> <img src="./figure/CN_id.png" width=2000> </details> <details> <summary>Training-free <strong>text</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/text_face.png" width=2000> </details> <details> <summary>Training-free <strong>segmentation</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/seg_face.png" width=2000> </details> <details> <summary>Training-free <strong>sketch</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/sketch_face.png" width=2000> </details> <details> <summary>Training-free <strong>landmarks</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/landmark_face.png" width=2000> </details> <details> <summary>Training-free <strong>face ID</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/id_face.png" width=2000> </details> <details> <summary>Training-free <strong>face ID</strong> guidance + <strong>landmarks</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/land+id.png" width=2000> </details> <details> <summary>Training-free <strong>text</strong> guidance + <strong>segmentation</strong> guidance on <strong>human faces</strong> (click to expand)</summary> <img src="./figure/seg+text.png" width=2000> </details> <details> <summary>Training-free <strong>style transferring</strong> guidance + <strong>Stable Diffusion</strong> (click to expand)</summary> <img src="./figure/SD_style_transfer.png" width=2000> </details> <details> <summary>Training-free <strong>text-guided</strong> face editting (click to expand)</summary> <img src="./figure/face_edit.png" width=2000> </details>

Acknowledgments

Our work is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:

We also introduce some recent works that shared similar ideas by updating the clean intermediate results $\mathbf{x}_{0|t}$:

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{yu2023freedom,
title={FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model},
author={Yu, Jiwen and Wang, Yinhuai and Zhao, Chen and Ghanem, Bernard and Zhang, Jian},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}