Home

Awesome

CVPR 2021 | Activate or Not: Learning Customized Activation.

This repository contains the official Pytorch implementation of the paper Activate or Not: Learning Customized Activation, CVPR 2021.

ACON

We propose a novel activation function we term the ACON that explicitly learns to activate the neurons or not. Below we show the ACON activation function and its first derivatives. β controls how fast the first derivative asymptotes to the upper/lower bounds, which are determined by p1 and p2.

<img src="https://user-images.githubusercontent.com/5032208/113257297-fc76f380-92fc-11eb-9559-39d033baea4c.png" width=90%> <img src="https://user-images.githubusercontent.com/5032208/113257194-cfc2dc00-92fc-11eb-94a0-f81569bed15e.png" width=90%>

Training curves

We show the training curves of different activations here.

<img src="https://user-images.githubusercontent.com/5032208/113260052-65ac3600-9300-11eb-8d2f-ef968be1c3a2.png" width=60%>

TFNet

To show the effectiveness of the proposed acon family, we also provide an extreme simple toy funnel network (TFNet) made only by pointwise convolution and ACON-FReLU operators.

<img src="https://user-images.githubusercontent.com/5032208/113963614-7a3a8200-985c-11eb-8946-65c0bcef0a80.png" width=60%>

Main results

The following results are the ImageNet top-1 accuracy relative improvements compared with the ReLU baselines. The relative improvements of Meta-ACON are about twice as much as SENet.

<img src="https://user-images.githubusercontent.com/5032208/113256618-fcc2bf00-92fb-11eb-9b1d-8f0589009a9b.png" width=60%>

The comparison between ReLU, Swish and ACON-C. We show improvements without additional amount of FLOPs and parameters:

ModelFLOPs#Params.top-1 err. (ReLU)top-1 err. (Swish)top-1 err. (ACON)
ShuffleNetV2 0.5x41M1.4M39.438.3 (+1.1)37.0 (+2.4)
ShuffleNetV2 1.5x299M3.5M27.426.8 (+0.6)26.5 (+0.9)
ResNet 503.9G25.5M24.023.5 (+0.5)23.2 (+0.8)
ResNet 1017.6G44.4M22.822.7 (+0.1)21.8 (+1.0)
ResNet 15211.3G60.0M22.322.2 (+0.1)21.2 (+1.1)

Next, by adding a negligible amount of FLOPs and parameters, meta-ACON shows sigificant improvements:

ModelFLOPs#Params.top-1 err.
ShuffleNetV2 0.5x (meta-acon)41M1.7M34.8 (+4.6)
ShuffleNetV2 1.5x (meta-acon)299M3.9M24.7 (+2.7)
ResNet 50 (meta-acon)3.9G25.7M22.0 (+2.0)
ResNet 101 (meta-acon)7.6G44.8M21.0 (+1.8)
ResNet 152 (meta-acon)11.3G60.5M20.5 (+1.8)

The simple TFNet without the SE modules can outperform the state-of-the art light-weight networks without the SE modules.

FLOPs#Params.top-1 err.
MobileNetV2 0.1742M1.4M52.6
ShuffleNetV2 0.5x41M1.4M39.4
TFNet 0.543M1.3M36.6 (+2.8)
MobileNetV2 0.6141M2.2M33.3
ShuffleNetV2 1.0x146M2.3M30.6
TFNet 1.0135M1.9M29.7 (+0.9)
MobileNetV2 1.0300M3.4M28.0
ShuffleNetV2 1.5x299M3.5M27.4
TFNet 1.5279M2.7M26.0 (+1.4)
MobileNetV2 1.4585M5.5M25.3
ShuffleNetV2 2.0x591M7.4M25.0
TFNet 2.0474M3.8M24.3 (+0.7)

Trained Models

Usage

Requirements

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Train:

python train.py  --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH

Eval:

python train.py --eval --eval-resume YOUR_WEIGHT_PATH --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH

Citation

If you use these models in your research, please cite:

@inproceedings{ma2021activate,
  title={Activate or Not: Learning Customized Activation},
  author={Ma, Ningning and Zhang, Xiangyu and Liu, Ming and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2021}
}