Home

Awesome

ImageFolder🚀: Autoregressive Image Generation with Folded Tokens

<div align="center">

project page  arXiv  huggingface weights 

</div> <!-- <p align="center" style="font-size: larger;"> <a href="placeholder">🔥ImageFolder: Autoregressive Image Generation with Folded Tokens</a> </p> --> <p align="center"> <div align=center> <img src=assets/teaser.png/> </div>

Updates

Ablation (updating)

IDMethodLengthrFID ↓gFID ↓ACC ↑
🔶1Multi-scale residual quantization (Tian et al., 2024)6801.927.52-
🔶2+ Quantizer dropout6801.716.03-
🔶3+ Smaller patch size K = 112653.246.56-
🔶4+ Product quantization & Parallel decoding2652.065.96-
🔶5+ Semantic regularization on all branches2651.975.21-
🔶6+ Semantic regularization on one branch2651.573.5340.5
🔷7+ Stronger discriminator2651.042.9450.2
🔷8+ Equilibrium enhancement2650.802.6058.0

🔶1-6 are already in the released paper, and after that 🔷7+ are advanced training settings used similar to VAR (gFID 3.30).

Generation

<div align=center> <img src=assets/visualization.png/> </div>

Visualization of Decomposed Token

<div align=center> <img src=assets/token-vis.png/> </div>

Acknowledge

We would like to thank the following repositories: LlamaGen, VAR and ControlVAR.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using

@misc{li2024imagefolderautoregressiveimagegeneration,
      title={ImageFolder: Autoregressive Image Generation with Folded Tokens}, 
      author={Xiang Li and Hao Chen and Kai Qiu and Jason Kuen and Jiuxiang Gu and Bhiksha Raj and Zhe Lin},
      year={2024},
      eprint={2410.01756},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.01756}, 
}