Home

Awesome

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

This is a Pytorch implementation of our ACMMM2022 paper. We have presented a new gating unit PoSGU which replace the FC layer in SGU of gMLP with relative positional encoding methods (Spercifically, LRPE and GQPE) and used it as the key building block to develop a new vision MLP architecture referred to as the PosMLP. We also hope this work will inspire further theoretical study of positional encoding in vision MLPs and could have a mature application as in vision Transformers.

Our code is based on the pytorch-image-models, attention-cnn, swim-transformer,vision-Permutator

Comparison with Recent MLP-like Models

ModelParametersImage resolutionTop 1 Acc.Download
gMLP-S20M22479.6%
Hire-MLP-S33M22481.8%
ViP-Small/725M22481.5%
PosMLP-T21M22482.1%百度云盘/GoogleDrive
S2-MLP-deep51M22480.7%
Mixer-B/1659M22478.5%
ViP-Medium/755M22482.7%
AS-MLP-S50M22483.1%
PosMLP-S37M22483.0%released soon
gMLP-B73M22481.6%
ResMLP-B24116M22481.0%
ViP-Large/788M22483.2%
Hire-MLP-L96M22483.4%
PosMLP-B82M22483.6%

The experiments are conducted on 8 RTX 3090 gpus.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script. Please update the data folder path in config files.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......
<!-- ### Validation Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path ``` CUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint ``` -->

Training

Command line for training PosMLP-T on 4 GPUs (RTX 3090)

bash scripts/distributed_train.sh

validation

Please download the checkpoint from above here and specify the data and model paths in the script, and test with command

CUDA_VISIBLE_DEVICES=0 bash scripts/test.sh
<!-- ### Reference You may want to cite: ``` @misc{hou2021vision, title={Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition}, author={Qibin Hou and Zihang Jiang and Li Yuan and Ming-Ming Cheng and Shuicheng Yan and Jiashi Feng}, year={2021}, eprint={2106.12368}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` -->

License

This repository is released under the MIT License as found in the LICENSE file.