Home

Awesome

1D Visual Tokenization and Generation

This repo hosts the code and models for the following projects:

Short Intro on Randomized Autoregressive Visual Generation (README)

RAR is a an autoregressive (AR) image generator with full compatibility to language modeling. It introduces a randomness annealing strategy with permuted objective at no additional cost, which enhances the model's ability to learn bidirectional contexts while leaving the autoregressive framework intact. RAR sets a FID score 1.48, demonstrating state-of-the-art performance on ImageNet-256 benchmark and significantly outperforming prior AR image generators.

<p> <img src="assets/rar_overview.png" alt="teaser" width=90% height=90%> </p> <p> <img src="assets/perf_comp.png" alt="teaser" width=90% height=90%> </p>

See more details at README_RAR.

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

We present a compact 1D tokenizer which can represent an image with as few as 32 discrete tokens. As a result, it leads to a substantial speed-up on the sampling process (e.g., 410 × faster than DiT-XL/2) while obtaining a competitive generation quality.

<p> <img src="assets/titok_teaser.png" alt="teaser" width=90% height=90%> </p> <p> <img src="assets/speed_vs_perf.png" alt="teaser" width=90% height=90%> </p>

See more details at README_TiTok.

Updates

Installation

pip3 install -r requirements.txt

Citing

If you use our work in your research, please use the following BibTeX entry.

@article{yu2024randomized,
  author    = {Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
  title     = {Randomized Autoregressive Visual Generation},
  journal   = {arXiv preprint arXiv:2411.00776},
  year      = {2024}
}
@article{yu2024an,
  author    = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
  title     = {An Image is Worth 32 Tokens for Reconstruction and Generation},
  journal   = {NeurIPS},
  year      = {2024}
}

Acknowledgement

MaskGIT

Taming-Transformers

Open-MUSE

MUSE-Pytorch