Home

Awesome

Awesome Unified Multimodal Models Awesome <!-- omit in toc -->

This is a repository for organizing papers, codes and other resources related to unified multimodal models.

<p align="center"> <img src="assets/unified_model.webp" alt="TAX" style="display: block; margin: 0 auto;" width="480px" /> </p>

:thinking: What are unified multimodal models?

Traditional multimodal models can be broadly categorized into two types: multimodal understanding and multimodal generation. Unified multimodal models aim to integrate these two tasks within a single framework. Such models are also referred to as Any-to-Any generation in the community. These models operate on the principle of multimodal input and multimodal output, enabling them to process and generate content across various modalities seamlessly.

:high_brightness: This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, or typos), please feel free to edit and pull a request. Just letting us know the title of papers can also be a great contribution to us. You can do this by open issue or contact us directly via email.

:star: If you find this repo useful, please star it!!!

Table of Contents <!-- omit in toc -->

Unified Multimodal Understanding and Generation

Multi Experts

Tokenizer

Acknowledgements

This template is provided by Awesome-Video-Diffusion and Awesome-MLLM-Hallucination.