Home

Awesome

AdderNet: Do We Really Need Multiplications in Deep Learning?

This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in Deep Learning?

We present adder networks (AdderNets) to trade massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

Run python main.py to train on CIFAR-10.

<p align="center"> <img src="figures/visualization.png" width="800"> </p>

Classification results on CIFAR-10 and CIFAR-100 datasets.

ModelMethodCIFAR-10CIFAR-100
VGG-smallANN93.72%72.64%
PKKD ANN95.03%76.94%
SLAC ANN93.96%73.63%
ResNet-20ANN92.02%67.60%
PKKD ANN92.96%69.93%
SLAC ANN92.29%68.31%
ShiftAddNet*89.32%(160epoch)-
ResNet-32ANN93.01%69.17%
PKKD ANN93.62%72.41%
SLAC ANN93.24%69.83%

Classification results on ImageNet dataset.

ModelMethodTop-1 AccTop-5 Acc
ResNet-18CNN69.8%89.1%
ANN67.0%87.6%
PKKD ANN68.8%88.6%
SLAC ANN67.7%87.9%
ResNet-50CNN76.2%92.9%
ANN74.9%91.7%
PKKD ANN76.8%93.3%
SLAC ANN75.3%92.6%

*ShiftAddNet used different training setting.

Super-Resolution results on several SR datasets.

ScaleModelMethodSet5 (PSNR/SSIM)Set14 (PSNR/SSIM)B100 (PSNR/SSIM)Urban100 (PSNR/SSIM)
×2VDSRCNN37.53/0.958733.03/0.912431.90/0.896030.76/0.9140
ANN37.37/0.957532.91/0.911231.82/0.894730.48/0.9099
EDSRCNN38.11/0.960133.92/0.919532.32/0.901332.93/0.9351
ANN37.92/0.958933.82/0.918332.23/0.900032.63/0.9309
×3VDSRCNN33.66/0.921329.77/0.831428.82/0.797627.14/0.8279
ANN33.47/0.915129.62/0.827628.72/0.795326.95/0.8189
EDSRCNN34.65/0.928230.52/0.846229.25/0.809328.80/0.8653
ANN34.35/0.921230.33/0.842029.13/0.806828.54/0.8555
×4VDSRCNN31.35/0.883828.01/0.767427.29/0.725125.18/0.7524
ANN31.27/0.876227.93/0.763027.25/0.722925.09/0.7445
EDSRCNN32.46/0.896828.80/0.787627.71/0.742026.64/0.8033
ANN32.13/0.886428.57/0.780027.58/0.736826.33/0.7874

Adversarial robustness on CIFAR-10 under white-box attacks without adversarial training.

ModelMethodCleanFGSMBIM7PGD7MIM5RFGSM5
ResNet-20CNN92.6816.330.000.000.010.00
ANN91.7218.420.000.000.040.00
CNN-R90.6217.233.463.674.230.06
ANN-R90.9529.9329.3029.7232.253.38
ANN-R-AWN90.5545.9342.6243.3946.5218.36
ResNet-32CNN92.7823.550.000.010.100.00
ANN92.4835.850.030.111.040.02
CNN-R91.3220.415.155.276.090.07
ANN-R91.6819.7415.9616.0817.480.07
ANN-R-AWN91.2561.3059.4159.7461.5439.79

Comparisons of mAP on PASCAL VOC.

ModelBackboneNeckmAP
Faster R-CNNConv R50Conv79.5
FCOSConv R50Conv79.1
RetinaNetConv R50Conv77.3
FoveaBoxConv R50Conv76.6
Adder-FCOSAdder R50Adder76.5

Requirements

Preparation

You can follow pytorch/examples to prepare the ImageNet data.

The pretrained models are available in google drive or baidu cloud (access code:126b)

Usage

Run python main.py to train on CIFAR-10.

Run python test.py --data_dir 'path/to/imagenet_root/' to evaluate on ImageNet val set. You will achieve 74.9% Top accuracy and 91.7% Top-5 accuracy on the ImageNet dataset using ResNet-50.

Run python test.py --dataset cifar10 --model_dir models/ResNet20-AdderNet.pth --data_dir 'path/to/cifar10_root/' to evaluate on CIFAR-10. You will achieve 91.8% accuracy on the CIFAR-10 dataset using ResNet-20.

The inference and training of AdderNets is slow since the adder filters is implemented without cuda acceleration. You can write cuda to achieve higher inference speed.

Citation

@article{AdderNet,
	title={AdderNet: Do We Really Need Multiplications in Deep Learning?},
	author={Chen, Hanting and Wang, Yunhe and Xu, Chunjing and Shi, Boxin and Xu, Chao and Tian, Qi and Xu, Chang},
	journal={CVPR},
	year={2020}
}

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.