Home

Awesome

<div align="center"> <h1>An item is Worth a Prompt: Versatile Image Editing with Disentangled Control</h1>

<a href='https://arxiv.org/abs/2403.04880'><img src='https://img.shields.io/badge/Technique-Report-red'></a>

</div> D-Edit is a versatile image editing framework based on diffusion models, supporting text, image, mask-based editing. <!-- <img src='assets/applications.png'> -->

Release

🔥 Examples

<p align="center"> <img alt="text" src="assets/demo1.gif" width="45%"> &nbsp; &nbsp; &nbsp; &nbsp; <img alt="image" src="assets/demo2.gif" width="45%"> </p>
  1. Text-Guided Editing:Allows users to select an object within an image and replace or refine it based on a text description.

    • Key features:
      • Generates more realistic details and smoother transitions than alternative methods
      • Focuses edits specifically on the targeted object
      • Preserves unrelated parts of the image
  2. Image-Guided Editing: Enables users to choose an object from a reference image and transplant it into another image while preserving its identity.

    • Key features:
      • Ensures seamless integration of the object into the new context
      • Adapts the object's appearance to match the target image's style
      • Works effectively even when the object's appearance differs significantly between reference and target images
<p align="center"> <img alt="mask" src="assets/demo3.gif" width="45%"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <img alt="remove" src="assets/demo4.gif" width="45%"> </p>
  1. Mask-Based Editing: Involves manipulating objects by directly editing their masks.

    • Key features:
      • Allows for operations like moving, reshaping, resizing, and refining objects
      • Fills in new details according to the object's associated prompt
      • Produces natural-looking results that maintain consistency with the overall image
  2. Item Removal: Enables users to remove objects from images by deleting the mask-object associations.

    • Key features:
      • Intelligently fills in the empty space left by removed objects
      • Ensures a coherent final image
      • Maintains the integrity of the surrounding image elements

🔧 Dependencies and Installation

conda create --name dedit python=3.10
conda activate dedit
pip install -U pip

# Install requirements
pip install -r requirements.txt

💻 Run

1. Segmentation

Put the image (of any resolution) to be edited into the folder with a specified name, and rename the image as "img.png" or "img.jpg". Then run the segmentation model

sh ./scripts/run_segment.sh

Alternatively, run GroundedSAM to detect with text prompt

sh ./scripts/run_segmentSAM.sh

Optionally, if segmentation is not good, refine masks with GUI by locally running the mask editing web:

python ui_edit_mask.py

For image-based editing, repeat this step for both reference and target images.

2. Model Finetuning

Finetune UNet cross-attention layer of diffusion models by running

sh ./scripts/sdxl/run_ft_sdxl_1024.sh

or finetune full UNet with lora

sh ./scripts/sdxl/run_ft_sdxl_1024_fulllora.sh

If image-based editing is needed, finetune the model with both reference and target images using

sh ./scripts/sdxl/run_ft_sdxl_1024_fulllora_2imgs.sh

3. Edit !

3.1 Reconstruction

To see if the original image can be constructed

sh ./scripts/sdxl/run_recon.sh

3.1 Text-based

Replace the target item (tgt_index) with the item described by the text prompt (tgt_prompt)

sh ./scripts/sdxl/run_text.sh

3.2 Image-based

Replace the target item (tgt_index) in the target image (tgt_name) with the item (src_index) in the reference image

sh ./scripts/sdxl/run_image.sh

3.3 Mask-based

For target items (tgt_indices_list), resize it (resize_list), move it (delta_x, delta_y) or reshape it by manually editing the mask shape (using UI).

The resulting new masks (processed by a simple algorithm) can be visualized in './example1/move_resize/seg_move_resize.png', if it is not reasonable, edit using the UI.

sh ./scripts/sdxl/run_move_resize.sh

3.4 Remove

Remove the target item (tgt_index), the remaining region will be reassigned to the nearby regions with a simple algorithm. The resulting new masks (processed by a simple algorithm) can be visualized in './example1/remove/seg_removed.png', if it is not reasonable, edit using the UI.

sh ./scripts/sdxl/run_move_resize.sh

3.4 General editing parameters

<p align="center"> <img src="assets/mask_def.png" height=200> </p>

Cite

If you find D-Edit useful for your research and applications, please cite us using this BibTeX:

@article{feng2024dedit,
  title={An item is Worth a Prompt: Versatile Image Editing with Disentangled Control},
  author={Aosong Feng, Weikang Qiu, Jinbin Bai, Kaicheng Zhou, Zhen Dong, Xiao Zhang, Rex Ying, and Leandros Tassiulas},
  journal={arXiv preprint arXiv:2403.04880},
  year={2024}
}