Home

Awesome

<div align=center> <img src="img/ARV_logo.png" width="180px"> </div> <h2 align="center"> Awesome Autoregressive Models in Vision <div align=center> </a></h2> <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for the latest update.</h5> <h5 align="center">

Awesome arxiv Hits GitHub Repo stars

</h5>

Autoregressive models have shown significant progress in generating high-quality content by modeling the dependencies sequentially. This repo is a curated list of papers about the latest advancements in autoregressive models in vision. This repo is being actively updated, please stay tuned!

Paper: Autoregressive Models in Vision: A Survey

Authors: Jing Xiong<sup>1,†</sup>, Gongye Liu<sup>2,†</sup>, Lun Huang<sup>3</sup>, Chengyue Wu<sup>1</sup>, Taiqiang Wu<sup>1</sup>, Yao Mu<sup>1</sup>, Yuan Yao<sup>4</sup>, Hui Shen<sup>5</sup>, Zhongwei Wan<sup>5</sup>, Jinfa Huang<sup>4</sup>, Chaofan Tao<sup>1,‡</sup>, Shen Yan<sup>6</sup>, Huaxiu Yao<sup>7</sup>, Lingpeng Kong<sup>1</sup>, Hongxia Yang<sup>9</sup>, Mi Zhang<sup>5</sup>, Guillermo Sapiro<sup>8,10</sup>, Jiebo Luo<sup>4</sup>, Ping Luo<sup>1</sup>, Ngai Wong<sup>1</sup>

<sup>1</sup>The University of Hong Kong, <sup>2</sup>Tsinghua University, <sup>3</sup>Duke University, <sup>4</sup>University of Rochester, <sup>5</sup>The Ohio State University, <sup>6</sup>Bytedance, <sup>7</sup>The University of North Carolina at Chapel Hill, <sup>8</sup>Apple, <sup>9</sup>The Hong Kong Polytechnic University, <sup>10</sup>Princeton University

<sup></sup> Core Contributors, <sup></sup> Corresponding Authors

📣 Update News

[2024-11-11] We have released the survey: Autoregressive Models in Vision: A Survey.

[2024-10-13] We have initialed the repository.

<div align=center> <img src="img/Timeline_4.0.png" width="800px"> </div>

⚡ Contributing

We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community. We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers, or update any preprint arXiv paper that has been accepted to some venue.

If you want to add your work or model to this list, please do not hesitate to pull requests. Markdown format:

* [**Name of Conference or Journal + Year**] Paper Name. [[Paper]](link) [[Code]](link)

📖 Table of Contents

<div align=center> <img src="img/outline_new.png" width="800px"> </div>

Image Generation

Unconditional/Class-Conditioned Image Generation

Text-to-Image Generation

Image-to-Image Translation

Image Editing

Video Generation

Unconditional Video Generation

Conditional Video Generation

Embodied AI

3D Generation

Motion Generation

Point Cloud Generation

3D Medical Generation

Multimodal Generation

Unified Understanding and Generation Multi-Modal LLMs

Other Generation

Accelerating & Stability & Analysis & Scaling

Tutorial

Evaluation Metrics

MetricAnalysis TypeReference
Inception Score (IS) ↑QuantitativeSalimans et al., 2016
Fréchet Inception Distance (FID) ↓QuantitativeHeusel et al., 2017
Kernel Inception Distance (KID) ↓QuantitativeBinkowski et al., 2018
Precision and Recall ↑QuantitativePowers, 2020
CLIP Maximum Mean Discrepancy ↓QuantitativeJayasumana et al., 2023
CLIP Score ↑QuantitativeHessel et al., 2021
R-precision ↑QuantitativeCraswell et al., 2009
Perceptual Path Length ↓QuantitativeKarras et al., 2019
Fréchet Video Distance (FVD) ↓QuantitativeUnterthiner et al., 2019
Aesthetic (Expert Evaluation) ↑QualitativeBased on domain expertise
Turing TestQualitativeTuring, 1950
User Studies (ratings, satisfaction)↑QualitativeVarious, depending on the user study methodology

Star History

Star History Chart

♥️ Contributors

<a href="https://github.com/ChaofanTao/Autoregressive-Models-in-Vision-Survey/graphs/contributors"> <img src="https://contrib.rocks/image?repo=ChaofanTao/Autoregressive-Models-in-Vision-Survey" /> </a>

📑 Citation

Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!

@misc{xiong2024autoregressive,
    title={Autoregressive Models in Vision: A Survey},
    author={Jing Xiong and Gongye Liu and Lun Huang and Chengyue Wu and Taiqiang Wu and Yao Mu and Yuan Yao and Hui Shen and Zhongwei Wan and Jinfa Huang and Chaofan Tao and Shen Yan and Huaxiu Yao and Lingpeng Kong and Hongxia Yang and Mi Zhang and Guillermo Sapiro and Jiebo Luo and Ping Luo and Ngai Wong},
    year={2024},
    eprint={2411.05902},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}