Awesome
ImageFolder🚀: Autoregressive Image Generation with Folded Tokens
<div align="center"> </div> <!-- <p align="center" style="font-size: larger;"> <a href="placeholder">🔥ImageFolder: Autoregressive Image Generation with Folded Tokens</a> </p> --> <p align="center"> <div align=center> <img src=assets/teaser.png/> </div>Updates
- (2024.11.14) Code will be released in two weeks (company approval in progress).
- (2024.10.03) We are working on advanced training of ImageFolder tokenizer.
- (2024.10.01) Repo created. Code and checkpoints will be released soon.
Ablation (updating)
ID | Method | Length | rFID ↓ | gFID ↓ | ACC ↑ |
---|---|---|---|---|---|
🔶1 | Multi-scale residual quantization (Tian et al., 2024) | 680 | 1.92 | 7.52 | - |
🔶2 | + Quantizer dropout | 680 | 1.71 | 6.03 | - |
🔶3 | + Smaller patch size K = 11 | 265 | 3.24 | 6.56 | - |
🔶4 | + Product quantization & Parallel decoding | 265 | 2.06 | 5.96 | - |
🔶5 | + Semantic regularization on all branches | 265 | 1.97 | 5.21 | - |
🔶6 | + Semantic regularization on one branch | 265 | 1.57 | 3.53 | 40.5 |
🔷7 | + Stronger discriminator | 265 | 1.04 | 2.94 | 50.2 |
🔷8 | + Equilibrium enhancement | 265 | 0.80 | 2.60 | 58.0 |
🔶1-6 are already in the released paper, and after that 🔷7+ are advanced training settings used similar to VAR (gFID 3.30).
Generation
<div align=center> <img src=assets/visualization.png/> </div>Visualization of Decomposed Token
<div align=center> <img src=assets/token-vis.png/> </div>Acknowledge
We would like to thank the following repositories: LlamaGen, VAR and ControlVAR.
Citation
If our work assists your research, feel free to give us a star ⭐ or cite us using
@misc{li2024imagefolderautoregressiveimagegeneration,
title={ImageFolder: Autoregressive Image Generation with Folded Tokens},
author={Xiang Li and Hao Chen and Kai Qiu and Jason Kuen and Jiuxiang Gu and Bhiksha Raj and Zhe Lin},
year={2024},
eprint={2410.01756},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.01756},
}