Home

Awesome

Awesome Unified Multimodal Models Awesome <!-- omit in toc -->

This is a repository for organizing papers, codes and other resources related to unified multimodal models.

<p align="center"> <img src="assets/unified_model.webp" alt="TAX" style="display: block; margin: 0 auto;" width="480px" /> </p>

:thinking: What are unified multimodal models?

Traditional multimodal models can be broadly categorized into two types: multimodal understanding and multimodal generation. Unified multimodal models aim to integrate these two tasks within a single framework. Such models are also referred to as Any-to-Any generation in the community. These models operate on the principle of multimodal input and multimodal output, enabling them to process and generate content across various modalities seamlessly.

:high_brightness: This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, or typos), please feel free to edit and pull a request. Just letting us know the title of papers can also be a great contribution to us. You can do this by open issue or contact us directly via email.

:star: If you find this repo useful, please star it!!!

<!-- ## Table of Contents <!-- omit in toc --> <!-- - [Open-source Toolboxes and Foundation Models](#open-source-toolboxes-and-foundation-models) - [Evaluation Benchmarks and Metrics](#evaluation-benchmarks-and-metrics) - [Single Model ](#single-model) - [Multi Experts](#multi-experts) - [Tokenizer](#tokenizers) -->

Unified Multimodal Understanding and Generation

<!-- ### Multi Experts + [TaxaBind: A Unified Embedding Space for Ecological Applications](https://arxiv.org/pdf/2411.00683) (Nov. 2024, arXiv) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.00683) [![Star](https://img.shields.io/github/stars/mvrl/TaxaBind.svg?style=social&label=Star)](https://github.com/mvrl/TaxaBind) [![Website](https://img.shields.io/badge/Website-9cf)](https://vishu26.github.io/taxabind/index.html) --> <!-- ### Tokenizer + [Cosmos Tokenizer: A suite of image and video neural tokenizers](https://developer.nvidia.com/blog/state-of-the-art-multimodal-generative-ai-model-development-with-nvidia-nemo/) (Nov. 2024, arXiv) [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://developer.nvidia.com/blog/state-of-the-art-multimodal-generative-ai-model-development-with-nvidia-nemo/) [![Star](https://img.shields.io/github/stars/NVIDIA/Cosmos-Tokenizer.svg?style=social&label=Star)](https://github.com/NVIDIA/Cosmos-Tokenizer) [![Website](https://img.shields.io/badge/Website-9cf)](https://research.nvidia.com/labs/dir/cosmos-tokenizer/) -->

Acknowledgements

This template is provided by Awesome-Video-Diffusion and Awesome-MLLM-Hallucination.