Home

Awesome

$M^{2}Chat$: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

The official release of $M^{2}Chat$. For more details, please refer to our paper on Arxiv. Or demo page.

<img src="figs/main_banner.png" width="1000" > <img src="figs/main_framework.png" width="1000" >

Updates!!

Quick Start

Installation

Step 0. Install ...

Step 1. Install ...

Step 2. Install requirements.

pip install -r requirements.txt

Notification

The publish version code is still under development.

Tutorials

Validation. TODO

Cite $M^{2}Chat$

If you use $M^{2}Chat$ in your research, please cite our work by using the following BibTeX entry:

@misc{chi2024m2chat,
      title={M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation}, 
      author={Xiaowei Chi and Rongyu Zhang and Zhengkai Jiang and Yijiang Liu and Yatian Wang and Xingqun Qi and Wenhan Luo and Peng Gao and Shanghang Zhang and Qifeng Liu and Yike Guo},
      year={2024},
      eprint={2311.17963},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@article{chi2023chatillusion,
  title={ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model},
  author={Chi, Xiaowei and Liu, Yijiang and Jiang, Zhengkai and Zhang, Rongyu and Lin, Ziyi and Zhang, Renrui and Gao, Peng and Fu, Chaoyou and Zhang, Shanghang and Liu, Qifeng and others},
  journal={arXiv preprint arXiv:2311.17963},
  year={2023}
}

Thanks

We highly appreciate the effort of Llama-AdapterV2 and Stable Diffusion XL.