Home

Awesome

<!-- ## <div align="center"><b>PhotoMaker</b></div> --> <p align="center"> <img src="https://yuzhou914.github.io/SmartEdit/assets/Logo.jpg" height=100> </p> <div align="center">

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models (CVPR-2024 Highlight)

[Paper] [Project Page] [Demo] <be>

</div>

🔥🔥 2024.04. SmartEdit is released!

🔥🔥 2024.04. SmartEdit is selected as highlight by CVPR-2024!

🔥🔥 2024.02. SmartEdit is accepted by CVPR-2024!

If you are interested in our work, please star ⭐ our project. <br>

SmartEdit Framework

<p align="center"> <img src="https://yuzhou914.github.io/SmartEdit/assets/2-SmartEdit.jpg"> </p>

SmartEdit on Understanding Scenarios

<p align="center"> <img src="https://yuzhou914.github.io/SmartEdit/assets/3-Understanding.jpg"> </p>

SmartEdit on Reasoning Scenarios

<p align="center"> <img src="https://yuzhou914.github.io/SmartEdit/assets/4-Reasoning.jpg"> </p>

Dependencies and Installation

    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
    pip install -r requirements.txt 
    git clone https://github.com/Dao-AILab/flash-attention.git
    cd flash-attention
    pip install . --no-build-isolation
    cd ..

Training model preparation

Training dataset preparation

Stage-1: textual alignment with CC12M

Stage-2: SmartEdit training

Inference

Explanation of new tokens:

Metrics Evaluation

Todo List

Contact

For any question, feel free to email yuzhouhuang@link.cuhk.edu.cn and lb.xie@siat.ac.cn

Citation

@inproceedings{huang2024smartedit,
  title={Smartedit: Exploring complex instruction-based image editing with multimodal large language models},
  author={Huang, Yuzhou and Xie, Liangbin and Wang, Xintao and Yuan, Ziyang and Cun, Xiaodong and Ge, Yixiao and Zhou, Jiantao and Dong, Chao and Huang, Rui and Zhang, Ruimao and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8362--8371},
  year={2024}
}