Home

Awesome

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance (NeurIPS 2024)

<a href="https://arxiv.org/abs/2406.07540"><img src="https://img.shields.io/badge/arXiv-Paper-red"></a> <a href="https://genforce.github.io/ctrl-x"><img src="https://img.shields.io/badge/Project-Page-yellow"></a> GitHub

Kuan Heng Lin<sup>1*</sup>, Sicheng Mo<sup>1*</sup>, Ben Klingher<sup>1</sup>, Fangzhou Mu<sup>2</sup>, Bolei Zhou<sup>1</sup> <br> <sup>1</sup>UCLA <sup>2</sup>NVIDIA <br> <sup>*</sup>Equal contribution <br>

Ctrl-X teaser figure

Getting started

Environment setup

Our code is built on top of diffusers v0.28.0. To set up the environment, please run the following.

conda env create -f environment.yaml
conda activate ctrlx

Running Ctrl-X

Gradio demo

We provide a user interface for testing our method. Running the following command starts the demo.

python app_ctrlx.py

Script

We also provide a script for running our method. This is equivalent to the Gradio demo.

python run_ctrlx.py \
    --structure_image assets/images/horse__point_cloud.jpg \
    --appearance_image assets/images/horse.jpg \
    --prompt "a photo of a horse standing on grass" \
    --structure_prompt "a 3D point cloud of a horse"

If appearance_image is not provided, then Ctrl-X does structure-only control. If structure_image is not provided, then Ctrl-X does appearance-only control.

Optional arguments

There are three optional arguments for both app_ctrlx.py and run_ctrlx.py:

Approximate GPU VRAM usage for the Gradio demo and script (structure and appearance control) on a single NVIDIA RTX A6000 is as follows.

FlagsInference time (s)GPU VRAM usage (GiB)
None28.818.8
model_offload38.312.6
sequential_offload169.33.8
disable_refiner25.514.5
model_offload + disable_refiner31.77.4
sequential_offload + disable_refiner151.43.8

Here, VRAM usage is obtained via torch.cuda.max_memory_reserved(), which is the closest option in PyTorch to nvidia-smi numbers but is probably still an underestimation. You can obtain these numbers on your own hardware by adding the benchmark flag for run_ctrlx.py.

Have fun playing around with Ctrl-X! :D

Future plans (a.k.a. TODOs)

Contact

For any questions, thoughts, discussions, and any other things you want to reach out for, please contact Jordan Lin (kuanhenglin@ucla.edu).

Reference

If you use our code in your research, please cite the following work.

@inproceedings{lin2024ctrlx,
    author = {Lin, {Kuan Heng} and Mo, Sicheng and Klingher, Ben and Mu, Fangzhou and Zhou, Bolei},
    booktitle = {Advances in Neural Information Processing Systems},
    title = {Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance},
    year = {2024}
}