Home

Awesome

<div align="center"> <h1>πŸš€ SEED-Voken: A Series of Powerful Visual Tokenizers</h1> </div>

The project aims to provide advanced visual tokenizers for autoregressive visual generation and currently supports the following methods: <br><br>

<a href="https://arxiv.org/abs/2409.04410">Open-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation</a><br> Zhuoyan Luo*, Fengyuan Shi*, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan<br> ARC Lab Tencent PCG, Tsinghua University, Nanjing University<br> <a href="./docs/Open-MAGVIT2.md">πŸ“šOpen-MAGVIT2.md</a>

@article{luo2024open,
  title={Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation},
  author={Luo, Zhuoyan and Shi, Fengyuan and Ge, Yixiao and Yang, Yujiu and Wang, Limin and Shan, Ying},
  journal={arXiv preprint arXiv:2409.04410},
  year={2024}
}

<a href="https://arxiv.org/abs/2412.02692">IBQ: Taming Scalable Visual Tokenizer for Autoregressive Image Generation</a><br> Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang<br> Nanjing University, Tsinghua University, ARC Lab Tencent PCG<br> <a href="./docs/IBQ.md">πŸ“šIBQ.md</a>

@article{shi2024taming,
  title={Taming Scalable Visual Tokenizer for Autoregressive Image Generation},
  author={Shi, Fengyuan and Luo, Zhuoyan and Ge, Yixiao and Yang, Yujiu and Shan, Ying and Wang, Limin},
  journal={arXiv preprint arXiv:2412.02692},
  year={2024}
}
<p align="center"> <img src="./assets/comparsion.png" width=90%> </p>

πŸ“° News

πŸ“– Implementations

Our codebase supports both NPU and GPU for training and inference. All experiments were conducted using the Ascend 910B for training, and we validated our models on the V100. The observed performance between the two platforms is nearly identical.

πŸ› οΈ Installation

GPU

NPU

Datasets

We use Imagenet2012 as our dataset.

imagenet
└── train/
    β”œβ”€β”€ n01440764
        β”œβ”€β”€ n01440764_10026.JPEG
        β”œβ”€β”€ n01440764_10027.JPEG
        β”œβ”€β”€ ...
    β”œβ”€β”€ n01443537
    β”œβ”€β”€ ...
└── val/
    β”œβ”€β”€ ...

⚑ Training & Evaluation

The training and evaluation scripts are in <a href="docs/Open-MAGVIT2.md">Open-MAGVIT2.md</a> and <a href="docs/IBQ.md">IBQ.md</a>.

❀️ Acknowledgement

We thank Lijun Yu for his encouraging discussions. We refer a lot from VQGAN and MAGVIT. We also refer to LlamaGen, VAR and RQVAE. Thanks for their wonderful work.