Home

Awesome

CCSR: Class-Conditional self-reward mechanism for improved Text-to-Image models

This repository contain the official implement of the paper CCSR: Self-Rewarding Pretrained Text-to-Image Models. Pre-print: arXiv

<!--*NB: This project is still under development and improvement. The codebase might be subject to regular updates.*-->

INTRODUCTION

CCSR is a mechanism that allows diffusion models (T2I) to learn from their own generated images and continuously self-improve. This technique is inspire from this paper: Self-Rewarding Language Models.

The idea is similar but the method is different. An overall flowchart of the Self-rewarding mechanism is presented in the Gif bellow.

selfrewarding

USAGE INSTRUCTION

INSTALLATION

  1. Clone the repository and install dependencies using
git clone https://github.com/safouaneelg/SRT2I.git
  1. Create a conda environment (optional but recommended) from environment.yml
conda env create -f environment.yml
conda activate srt2i

USAGE

Step-by-step self-rewarding

  1. First step is to generate the prompts for generative text-to-image diffusion model. This could be achieved using the following command:
python llm/prompts_generator.py --model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ" --class_list "llm/class_list.json" --output_prompts "generated_prompts.txt" --prompts_number 30 --class_ids 15,16,17,20,21

The default parameters are:

Those are the class ids used in the paper: {20:Elephant} and {23:Giraffe}. In case you generate new prompts for other classes please change with the appropriate ids in class_list.

  1. The subsequent step is the generation of images from prompts, which can be done by using stable diffusion model. Run the following command to generate the images. The images are stacked by 10
python diff_generator/fromprompt_t2i_generation.py --diffusion-model "stabilityai/stable-diffusion-2-1-base" --output-folder "generative_images/" --prompts "generated_prompts.txt"

The default parameters are:

  1. To extract the optimal images based on the generated stable diffusion images. You can run the following command:
python sr_mechanism/self-reward_dataset_creation.py --image_folder 'path/to/images/folder/' --prompts_file 'path/to/prompts_file.txt' --llava_model 'LLAVA_MODEL' --yolo_model 'YOLO_WORLD_MODEL' 'yolov8x-worldv2.pt' --output_folder './optimal_pairs4/'

Parsers:

  1. Fine-tune stable diffusion on the images stored in the output_folder. To do so, follow these steps:

Once the dataset is ready and the training file customized, a single runs the script:

python tutorial/fine_tune_sd/fine_tune_lora4.sh

In this example script, a single GPU is used to train for 100 epochs, using wandb for logging combined with validation prompts.

Citation

@misc{ghazouali2024classconditional,
      title={Class-Conditional self-reward mechanism for improved Text-to-Image models}, 
      author={Safouane El Ghazouali and Arnaud Gucciardi and Umberto Michelucci},
      year={2024},
      eprint={2405.13473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.13473}
}

Licence

This code is open for research and development purposes only. No commercial use of this software is permitted. For additional information, contact: safouane.elghazouali@toelt.ai.