Home

Awesome

S<sup>2</sup>-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)

This is the official pytorch implementation of our paper:

"S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration" (CVPR 2021)

by Zhiqiang Shen, Zechun Liu, Jie Qin, Lei Huang, Kwang-Ting Cheng and Marios Savvides.

<div align=center> <img width=99% src="./imgs/s2bnn.png"/> </div>

In this paper, we introduce a simple yet effective self-supervised approach using distillation loss for learning efficient binary neural networks. Our proposed method can outperform the simple contrastive learning baseline (MoCo V2) by an absolute gain of 5.5∼15% on ImageNet.

The student models are not restricted to the binary neural networks, you can replace with any efficient/compact models.

News

[Aug. 18, 2021] Add ResNet-50 result as the student (real-valued model) with SwAV/RN50-w4 as the teacher.

ModelsPre-train epochsbatch sizelinear cls. logTop-1 (%)Pre-train models
SimSiam100512log68.1GitHub
S2-BNN (Distillation_only)100512log68.7Download

Note that we use the same linear evaluation procedure as SimSiam.

Citation

If you find our code is helpful for your research, please cite:

@InProceedings{Shen_2021_CVPR,
	author    = {Shen, Zhiqiang and Liu, Zechun and Qin, Jie and Huang, Lei and Cheng, Kwang-Ting and Savvides, Marios},
	title     = {S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year      = {2021}

}

Preparation

1. Requirements:

2. Data:

Training & Testing

To train a model, run the following scripts. All our models are trained with 8 GPUs.

1. Standard Two-Step Training:

Our enhanced MoCo V2:

Step 1:

cd Contrastive_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48  

Step 2:

cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]  --mlp --moco-t 0.2 --aug-plus --cos -j 48  --model-path ../step1/checkpoint_0199.pth.tar

Our MoCo V2 + Distillation Loss:

Download real-valued teacher network here. We use MoCo V2 800-epoch pretrained model, while you can choose other stronger self-supervised models as the teachers.

Step 1:

cd Contrastive+Distillation/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar 

Step 2:

cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0  --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

Our Distillation Loss Only:

Step 1:

cd Distillation_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar 

Step 2:

cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar

2. Simple One-Step Training (Conventional):

Our enhanced MoCo V2:

cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 

Our MoCo V2 + Distillation Loss:

cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar 

Our Distillation Loss Only:

cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar 

You can replace binary neural networks with any kinds of efficient/compact models on one-step training.

3. Testing:

Results & Models

We provide pre-trained models with different training strategies, we report in the table #Epochs, FLOPs, OPs, Top-1 accuracy on ImageNet validation set:

Models#EpochFLOPs (x10<sup>8</sup>)OPs (x10<sup>8</sup>)Top-1 (%)Trained models
MoCo V2 baseline2000.120.8746.9Download
Our enhanced MoCo V22000.120.8752.5Download
Our MoCo V2 + Distillation Loss2000.120.8756.0Download
Our Distillation Loss Only2000.120.8761.5Download

Training Logs

Our linear evaluation logs are availabe at here.

Acknowledgement

MoCo V2 (Improved Baselines with Momentum Contrastive Learning)

ReActNet (ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions)

MEAL V2 (MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks)

Contact

Zhiqiang Shen, CMU (zhiqiangshen0214 at gmail.com)