Awesome

Softmax-free Linear Transformers

SOFT: Softmax-free Transformer with Linear Complexity,
Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang
NeurIPS 2021

Softmax-free Linear Transformers,
Jiachen Lu, Junge Zhang, Xiatian Zhu, Jianfeng Feng, Tao Xiang, Li Zhang
IJCV 2024

What's new

We propose a normalized softmax-free self-attention with stronger generalizability.
SOFT is now avaliable on more vision tasks (object detection and semantic segmentation).

NEWS

[2024/02/12] Our journal extension Softmax-free Linear Transformer is accepted by IJCV.
[2022/07/05] SOFT is now available for downstream tasks! An efficient normalization is applied to SOFT. Please refer to SOFT-Norm

Requirments

timm==0.3.2
torch>=1.7.0 and torchvision that matches the PyTorch installation
cuda>=10.2

Compilation may be fail on cuda < 10.2.
We have compiled it successfully on cuda 10.2 and cuda 11.2.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Installation

git clone https://github.com/fudan-zvg/SOFT.git
python -m pip install -e SOFT

Main results

ImageNet-1K Image Classification

Model	Resolution	Params	FLOPs	Top-1 %	Config	Pretrained Model
SOFT-Tiny	224	13M	1.9G	79.3	SOFT_Tiny.yaml, SOFT_Tiny_cuda.yaml	SOFT_Tiny, SOFT_Tiny_cuda
SOFT-Small	224	24M	3.3G	82.2	SOFT_Small.yaml, SOFT_Small_cuda.yaml
SOFT-Medium	224	45M	7.2G	82.9	SOFT_Meidum.yaml, SOFT_Meidum_cuda.yaml
SOFT-Large	224	64M	11.0G	83.1	SOFT_Large.yaml, SOFT_Large_cuda.yaml
SOFT-Huge	224	87M	16.3G	83.3	SOFT_Huge.yaml, SOFT_Huge_cuda.yaml
SOFT-Tiny-Norm	224	13M	1.9G	79.4	SOFT_Tiny_norm.yaml	SOFT_Tiny_norm
SOFT-Small-Norm	224	24M	3.3G	82.4	SOFT_Small_norm.yaml	SOFT_Small_norm
SOFT-Medium-Norm	224	45M	7.2G	83.1	SOFT_Meidum_norm.yaml	SOFT_Medium_norm
SOFT-Large-Norm	224	64M	11.0G	83.3	SOFT_Large_norm.yaml	SOFT_Large_norm
SOFT-Huge-Norm	224	87M	16.3G	83.4	SOFT_Huge_norm.yaml

COCO Object Detection (2017 val)

Backbone	Method	lr schd	box mAP	mask mAP	Params
SOFT-Tiny-Norm	RetinaNet	1x	40.0	-	23M
SOFT-Tiny-Norm	Mask R-CNN	1x	41.2	38.2	33M
SOFT-Small-Norm	RetinaNet	1x	42.8	-	34M
SOFT-Small-Norm	Mask R-CNN	1x	43.8	40.1	44M
SOFT-Medium-Norm	RetinaNet	1x	44.3	-	55M
SOFT-Medium-Norm	Mask R-CNN	1x	46.6	42.0	65M
SOFT-Large-Norm	RetinaNet	1x	45.3	-	74M
SOFT-Large-Norm	Mask R-CNN	1x	47.0	42.2	84M

ADE20K Semantic Segmentation (val)

Backbone	Method	Crop size	lr schd	mIoU	Params
SOFT-Small-Norm	UperNet	512x512	1x	46.2	54M
SOFT-Medium-Norm	UperNet	512x512	1x	48.0	76M

Get Started

Train

We have two implementations of Gaussian Kernel: PyTorch version and the exact form of Gaussian function implemented by cuda. The config file containing cuda is the cuda implementation. Both implementations yield same performance. Please install SOFT before running the cuda version.

./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE}
# For example, train SOFT-Tiny on Imagenet training dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml

Test


./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE} --eval_checkpoint ${CHECKPOINT_FILE} --eval

# For example, test SOFT-Tiny on Imagenet validation dataset with 8 GPUs

./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml --eval_checkpoint ${CHECKPOINT_FILE} --eval

Reference

@inproceedings{SOFT,
    title={SOFT: Softmax-free Transformer with Linear Complexity}, 
    author={Lu, Jiachen and Yao, Jinghan and Zhang, Junge and Zhu, Xiatian and Xu, Hang and Gao, Weiguo and Xu, Chunjing and Xiang, Tao and Zhang, Li},
    booktitle={NeurIPS},
    year={2021}
}

@article{Softmax,
    title={Softmax-free Linear Transformers}, 
    author={Lu, Jiachen and Zhang, Li and Zhang, Junge and Zhu, Xiatian and Feng, Jianfeng and Xiang, Tao},
    journal={International Journal of Coumputer Vision},
    year={2024}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
Detectron2
T2T-ViT
PVT
Nystromformer
pytorch-image-models