Awesome
Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis
Overview
This code repository accompanies the paper "Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis." The paper introduces a novel approach to enhance text-to-image (T2I) generative models by using Large Language Models (LLMs) as layout generators and an adapter module for integrating layout into image synthesis.
Requirements
We provide a modified source code of diffusers in the repository.
pip install --upgrade transformers scipy
Download LACA/LASA
Model checkpoints can be downloaded at https://huggingface.co/xchen16/LACA
Inference
step 1
Obtain the bounding boxes response from ChatGPT, and save it to llm_response.txt. Two versions of the prompts are provided in the prompts folders. Make sure your response format strictly follows the example below:
Caption: A dog stands and four balloons are in the air.
[chat gpt reasoning, will be ignored during parsing]
### Answer
- object 0: A dog [(136, 204, 376, 460)]
- object 1: Four balloons [(51, 51, 102, 102), (409, 51, 460, 102), (255, 0, 306, 51), (306, 102, 357, 153)]
- ...
step 2
Run the generation script
python generate.py --controlnet_path ./checkpoints/laca_800000 --response_file ./llm_response.txt --g1 5.5 --g2 5.5 --tau 0.2