Home

Awesome

Open Source 🎲 RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

[Project Page] [arXiv] [HuggingFace]

arXiv Project HuggingFace License

Overview

Ever thinking about what is the prerequisite for a visual model achieving the impact of GPT in language? The prequisite should be its ability of zero-shot generalization to various applications, prompts, etc. Our RandAR is one of the attempts towards this objective.

🎲 RandAR is a decoder-only AR model generating image tokens in arbitrary orders.

🚀 RandAR supports parallel-decoding without additional fine-tuning and brings 2.5 $\times$ acceleration for AR generation.

🛠️ RandAR unlocks new capabilities for causal GPT-style transformers: inpainting, outpainting, zero-shot resolution extrapolation, and bi-directional feature encoding.

<img src="imgs/teaser.png" alt="teaser" width="100%">

News

Getting Started

Checkout our documentation DOCUMENTATION.md for more details.

Citation

If you find this work useful in your research, please consider citing:

@article{pang2024randar,
    title={RandAR: Decoder-only Autoregressive Visual Generation in Random Orders},
    author={Pang, Ziqi and Zhang, Tianyuan and Luan, Fujun and Man, Yunze and Tan, Hao and Zhang, Kai and Freeman, William T. and Wang, Yu-Xiong},
    journal={arXiv preprint arXiv:2412.01827},
    year={2024}
}

Acknowledgement

Thank you to the open-source community for their explorations on autoregressive generation, especially LLaMAGen.