Home

Awesome

FDSL on VisualAtom

TOC

Summary

The repository contains VisualAtom Construction, Pre-training and Fine-tuning in Python/PyTorch. The repository is based on the paper: Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka and Rio Yokota, "Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. [Project] [PDF] [Dataset] [Poster] [Supp]

<!-- TODO [[Oral](http://hirokatsukataoka.net/pdf/cvpr22_kataoka_oral.pdf)] -->

Updates

<!-- TODO update -->

Update (Mar. 24, 2023)

Update (Feb. 28, 2023)

Citation

<!-- TODO update pages -->

If you use this scripts, please cite the following paper:

@InProceedings{takashima2023visual,
    author    = {Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka and Rio Yokota},
    title     = {Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {18579-18588}
}
<!-- ```bibtex @article{takashima2023visual, title={Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves}, author={Takashima, Sora and Hayamizu, Ryo and Inoue, Nakamasa and Kataoka, Hirokatsu and Yokota, Rio}, journal={arXiv preprint arXiv:2303.01112}, year={2023} } ``` -->

Requirements

Please install packages with the following command.

$ pip install --upgrade pip
$ pip install -r requirements.txt

VisualAtom Construction (README)

$ cd visual_atomic_renderer
$ bash make_VisualAtom.sh

You can also download raw VisualAtom-1k here : zenodo

Pre-training

We used almost the same scripts as in Kataoka_2022_CVPR for our pre-training.

Run the python script pretrain.py, you can pre-train with your dataset.

Basically, you can run the python script pretrain.py with the following command.

Or you can run the job script scripts/pretrain.sh (support multiple nodes training with OpenMPI). Note, the setup is multiple nodes and using a large number of GPUs (32 nodes and 128 GPUs for pre-train).

When running with the script above, please make your dataset structure as following.

/PATH/TO/VisualAtom21000/
    image/
        00000/
        00000_0000.png
        00000_0001.png
        ...
        00001/
        00001_0000.png
        00001_0001.png
        ...
        ...
    ...

After above pre-training, trained models are created like output/pretrain/pretrain_deit_base_VisualAtom21000_1.0e-3/model_best.pth.tar and output/pretrain/pretrain_deit_base_VisualAtom21000_1.0e-3/last.pth.tar. Moreover, you can resume the training from a checkpoint by setting --resume parameter.

Please see the script and code files for details on each arguments.

Pre-training with shard dataset

Shard dataset is also available for accelerating IO processing. To make shard dataset, please refer to this repository: https://github.com/webdataset/webdataset. Here is an Example of training with shard dataset.

​ When running with the script above with shard dataset, please make your shard dataset structure as following.

/PATH/TO/VisualAtom21000/
    SHARDS-000000.tar
    SHARDS-000001.tar
    ...
    SHARDS-002099.tar

Pre-trained models

Our pre-trained models are available in this [Link].

We have mainly prepared three different pre-trained models. These pre-trained models are ViT-Tiny/Base (patch size of 16, input size of 224) pre-trained on VisualAtom-1k/21k and Swin-Base (patch size of 7, window size of 7, input size of 224) pre-trained on VisualAtom-21k.

vit_tiny_with_visualatom_1k.pth.tar: timm model is deit_tiny_patch16_224, pre-trained on VisualAtom-1k
vit_base_with_visualatom_21k.pth.tar: timm model is deit_base_patch16_224, pre-trained on VisualAtom-21k
swin_base_with_visualatom_21k.pth.tar: timm model is swin_base_patch4_window7_224, pre-trained on VisualAtom-21k

Fine-tuning

We used fine-tuning scripts based on Nakashima_2022_AAAI.

Run the python script finetune.py, you additionally train other datasets from your pre-trained model.

In order to use the fine-tuning code, you must prepare a fine-tuning dataset (e.g., CIFAR-10/100, ImageNet-1k, Pascal VOC 2012). You should set the dataset as the following structure.

/PATH/TO/DATASET/
  train/
    class1/
      img1.jpeg
      ...
    class2/
      img2.jpeg
      ...
    ...
  val/
    class1/
      img3.jpeg
      ...
    class2/
      img4.jpeg
      ...
    ...

Basically, you can run the python script finetune.py with the following command.

Or you can run the job script scripts/finetune.sh (support multiple nodes training with OpenMPI).

Please see the script and code files for details on each arguments.

Terms of use

The authors affiliated in National Institute of Advanced Industrial Science and Technology (AIST) and Tokyo Institute of Technology (TITech) are not responsible for the reproduction, duplication, copy, sale, trade, resell or exploitation for any commercial purposes, of any portion of the images and any portion of derived the data. In no event will we be also liable for any other damages resulting from this data or any derived data.