Home

Awesome

<a name="readme-top"></a>

<div align="center"> <a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/stargazers"><img src="https://img.shields.io/github/stars/atfortes/Awesome-Controllable-Diffusion?style=for-the-badge" alt="Stargazers"></a> <a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/network/members"><img src="https://img.shields.io/github/forks/atfortes/Awesome-Controllable-Diffusion?style=for-the-badge" alt="Forks"></a> <a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/graphs/contributors"><img src="https://img.shields.io/github/contributors/atfortes/Awesome-Controllable-Diffusion?style=for-the-badge" alt="Contributors"></a> <a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/blob/main/README.md"><img src="https://img.shields.io/badge/Papers-70-70?style=for-the-badge" alt="Papers"></a> <a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/blob/main/LICENSE"><img src="https://img.shields.io/github/license/atfortes/Awesome-Controllable-Diffusion?style=for-the-badge" alt="MIT License"></a> </div> <h1 align="center">Awesome Controllable Diffusion</h1> <p align="center"> <b> Papers and Resources on Adding Conditional Controls to Diffusion Models in the Era of AIGC.</b> </p> <details> <summary>🗂️ Table of Contents</summary> <ol> <li><a href="#papers">📝 Papers</a></li> <ul> <li><a href="#2024"> 2024</a></li> <li><a href="#2023"> 2023</a></li> </ul> <li><a href="#other-resources">🔗 Other Resources</a></li> <li><a href="#other-awesome-lists">🌟 Other Awesome Lists</a></li> <li><a href="#contributing">✍️ Contributing</a></li> </ol> </details>

<h1 id="papers">📝 Papers<h1/>

2024

  1. IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation. 🔥 [project] [paper] [code]

    Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang. Preprint 2024.

    <img src="assets/ifa.png" style="width:100%">
  2. CSGO: Content-Style Composition in Text-to-Image Generation. [project] [paper] [code]

    Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li. Preprint 2024.

  3. Generative Photomontage. [project] [paper] [code]

    Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu. Preprint 2024.

  4. Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches. [project] [paper]

    Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li. Preprint 2024.

  5. IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts. [project] [paper] [code]

    Ciara Rowles, Shimon Vainer, Dante De Nigris, Slava Elizarov, Konstantin Kutsy, Simon Donné. Preprint 2024.

  6. ViPer: Visual Personalization of Generative Models via Individual Preference Learning. [project] [paper] [code]

    Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir. ECCV'24.

  7. Training-free Composite Scene Generation for Layout-to-Image Synthesis. [paper] [code]

    Jiaqi Liu, Tao Huang, Chang Xu. ECCV'24.

  8. SEED-Story: Multimodal Long Story Generation with Large Language Model. [paper] [code]

    Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen. Preprint 2024.

  9. Sketch-Guided Scene Image Generation. [paper]

    Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie. Preprint 2024.

  10. Instant 3D Human Avatar Generation using Image Diffusion Models. [project] [paper]

    Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu. ECCV'24.

  11. Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance. 🔥 [project] [paper] [code]

    Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou. Preprint 2024.

    <img src="assets/ctrl-x.png" style="width:100%">
  12. Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis. [paper] [code]

    Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi. CVPR'24.

  13. pOps: Photo-Inspired Diffusion Operators. 🔥 [project] [paper] [code]

    Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or. Preprint 2024.

    <img src="assets/pops.png" style="width:100%">
  14. RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control. [project] [paper] [code]

    Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu. Preprint 2024. 🔥

    <img src="assets/rb.png" style="width:100%">
  15. FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition. [project] [paper] [code]

    Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen. CVPR'24.

  16. Personalized Residuals for Concept-Driven Text-to-Image Generation. [project] [paper]

    Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz. CVPR'24.

  17. Compositional Text-to-Image Generation with Dense Blob Representations. 🔥 [project] [paper]

    Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat. ICML'24.

    <img src="assets/blob.png" style="width:100%">
  18. Customizing Text-to-Image Models with a Single Image Pair. [project] [paper] [code]

    Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu. Preprint 2024.

  19. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation. [paper]

    Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou. Preprint 2024.

  20. InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation. [paper]

    Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek. Preprint 2024.

  21. PuLID: Pure and Lightning ID Customization via Contrastive Alignment. [paper] [code]

    Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He. Tech Report 2024.

  22. MultiBooth: Towards Generating All Your Concepts in an Image from Text. [project] [paper] [code]

    Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu. Preprint 2024.

  23. StyleBooth: Image Style Editing with Multimodal Instruction. [project] [paper] [code]

    Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang. Preprint 2024.

  24. MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation. 🔥 [project] [paper] [code]

    Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang. ECCV'24.

    <img src="assets/moma.png" style="width:100%">
  25. Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding. [paper]

    Zezhong Fan, Xiaohan Li, Chenhao Fang, Topojoy Biswas, Kaushiki Nag, Jianpeng Xu, Kannan Achan. WWW'24.

  26. MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation. [project] [paper] [code]

    Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman. Preprint 2024.

  27. MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models. [project] [paper] [code]

    Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M Patel. ECCV'24.

  28. Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model. [project] [paper] [code]

    Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal. Preprint 2024.

  29. ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback. [project] [paper] [code]

    Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen. ECCV'24.

  30. Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models. [project] [paper]

    Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang. Preprint 2024.

  31. Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models. [paper]

    Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron. CVPR'24.

  32. FlashFace: Human Image Personalization with High-fidelity Identity Preservation. [project] [paper] [code]

    Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo. Preprint 2024.

  33. Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation. [project] [paper] [code]

    Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen-Or. ECCV'24.

  34. Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. [project] [paper] [code]

    Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer. Preprint 2024.

  35. Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation. [project] [paper] [code]

    Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan. ECCV'24.

  36. FeedFace: Efficient Inference-based Face Personalization via Diffusion Models. 🔥 [paper] [code]

    Chendong Xiang, Armando Fortes, Khang Hui Chua, Hang Su, Jun Zhu. Tiny Papers @ ICLR'24.

    <img src="assets/feedface.png" style="width:100%">
  37. Multi-LoRA Composition for Image Generation. [project] [paper] [code]

    Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen. Preprint 2024.

  38. Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition. [project] [paper] [code]

    Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H.T. Kung, Yubei Chen. Tech Report 2024.

  39. Visual Style Prompting with Swapping Self-Attention. [project] [paper] [code]

    Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh. Preprint 2024.

  40. RealCompo: Dynamic Equilibrium between Realism and Composition Improves Text-to-Image Diffusion Models. [paper] [code]

    Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui. Preprint 2024.

  41. Direct Consistency Optimization for Compositional Text-to-Image Personalization. [project] [paper] [code]

    Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin. Preprint 2024.

  42. InstanceDiffusion: Instance-level Control for Image Generation. [project] [paper] [code]

    Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra. CVPR'24.

  43. Training-Free Consistent Text-to-Image Generation. [project] [paper]

    Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon. SIGGRAPH'24.

  44. UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion. 🔥 [project] [paper]

    Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao. ACL'24.

    <img src="assets/unimo-g.png" style="width:100%">
  45. Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs. 🔥 [paper] [code]

    Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui. ICML'24.

    <img src="assets/rpg.png" style="width:100%">
  46. InstantID: Zero-shot Identity-Preserving Generation in Seconds. [project] [paper] [code]

    Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, Yao Hu. Tech Report 2024. 🔥

    <img src="assets/instant-id.png" style="width:100%">
  47. PALP: Prompt Aligned Personalization of Text-to-Image Models. [project] [paper]

    Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen. Preprint 2024.

  48. SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing. [project] [paper] [code]

    Zeyinzi Jiang, Chaojie Mao, Yulin Pan, Zhen Han, Jingfeng Zhang. CVPR'24.

  49. PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding. [project] [paper] [code]

    Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan. CVPR'24. 🔥

    <img src="assets/photomaker.png" style="width:100%">
  50. Context Diffusion: In-Context Aware Image Generation. [project] [paper]

    Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic. ECCV'24.

  51. Style Aligned Image Generation via Shared Attention. 🔥 [project] [paper] [code]

    Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or. CVPR'24.

    <img src="assets/style-aligned.png" style="width:100%">
  52. Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models. [project] [paper] [code]

    Daniel Geng, Inbum Park, Andrew Owens. CVPR'24.

  53. MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. [project] [paper] [code]

    Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Xiao Yang, Mohammad Soleymani. ICML'24.

  54. The Chosen One: Consistent Characters in Text-to-Image Diffusion Models. [project] [paper] [code]

    Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski. SIGGRAPH'24.

  55. Cross-Image Attention for Zero-Shot Appearance Transfer. [project] [paper] [code]

    Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel Cohen-Or. SIGGRAPH'24.

  56. Kosmos-G: Generating Images in Context with Multimodal Large Language Models 🔥 [project] [paper] [code]

    Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei. ICLR'24.

    <img src="assets/kosmos-g.png" style="width:100%">
  57. InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning. [paper]

    Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung. CVPR'24.

2023

  1. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs. [project] [paper]

    Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani. Preprint 2023.

  2. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. 🔥 [project] [paper] [code]

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang. Tech Report 2023.

    <img src="assets/ip-adapter.png" style="width:100%">
  3. Zero-shot spatial layout conditioning for text-to-image diffusion models.

    Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek. ICCV'23.

  4. Controlling Text-to-Image Diffusion by Orthogonal Finetuning. [project] [paper] [code]

    Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schölkopf. NeruIPS'23.

  5. Face0: Instantaneously Conditioning a Text-to-Image Model on a Face. [paper]

    Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan. SIGGRAPH Asia'23.

  6. StyleDrop: Text-to-Image Generation in Any Style. 🔥 [project] [paper]

    Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan. NeurIPS'23.

    <img src="assets/styledrop.png" style="width:100%">
  7. BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. 🔥 [project] [paper] [code]

    Dongxu Li, Junnan Li, Steven C.H. Hoi. NeurIPS'23.

    <img src="assets/blip-diffusion.png" style="width:100%">
  8. Subject-driven Text-to-Image Generation via Apprenticeship Learning. [paper]

    Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen. NeurIPS'23.

  9. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. 🔥 [paper] [code]

    Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie. Tech Report 2023.

    <img src="assets/t2i-adapter.png" style="width:100%">
  10. Adding Conditional Control to Text-to-Image Diffusion Models. 🔥 [paper] [code]

    Lvmin Zhang, Anyi Rao, Maneesh Agrawala. ICCV'23.

    <img src="assets/controlnet.png" style="width:100%">
  11. GLIGEN: Open-Set Grounded Text-to-Image Generation. 🔥 [project] [paper] [code]

    Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee. CVPR'23.

  12. Multi-Concept Customization of Text-to-Image Diffusion. [project] [paper] [code]

    Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu. CVPR'23.

  13. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. 🔥 [project] [paper]

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman. CVPR'23.

    <img src="assets/dreambooth.png" style="width:100%">
<p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> <a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;"> ↑ Back to Top ↑ </a> </p>

<h1 id="other-resources">🔗 Other Resources# <h1/>

  1. Regional Prompter Set a prompt to a divided region.
<p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> <a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;"> ↑ Back to Top ↑ </a> </p>

<h1 id="other-awesome-lists">🌟 Other Awesome Lists<h1/>

  1. Awesome-LLM-Reasoning Collection of papers and resources on Reasoning in Large Language Models.

  2. Awesome-Controllable-T2I-Diffusion-Models A collection of resources on controllable generation with text-to-image diffusion models.

<p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> <a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;"> ↑ Back to Top ↑ </a> </p>

<h1 id="contributing">✍️ Contributing # <h1/>

Don't worry if you do something wrong, it will be fixed for you!

Contributors

<a href="https://github.com/atfortes/Awesome-Controllable-Diffusion/graphs/contributors"> <img src="https://contrib.rocks/image?repo=atfortes/Awesome-Controllable-Diffusion" /> </a>

Star History

Star History Chart