Awesome
PytorchInsight
This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.
This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.
Including Papers (to be updated):
Attention Models
- SENet: Squeeze-and-excitation Networks <sub>(paper)</sub>
- SKNet: Selective Kernel Networks <sub>(paper)</sub>
- CBAM: Convolutional Block Attention Module <sub>(paper)</sub>
- GCNet: GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond <sub>(paper)</sub>
- BAM: Bottleneck Attention Module <sub>(paper)</sub>
- SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks <sub>(paper)</sub>
- SRMNet: SRM: A Style-based Recalibration Module for Convolutional Neural Networks <sub>(paper)</sub>
Non-Attention Models
- OctNet: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution <sub>(paper)</sub>
- imagenet_tricks.py: Bag of Tricks for Image Classification with Convolutional Neural Networks <sub>(paper)</sub>
- Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer <sub>(to appear)
- Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay <sub>(to appear)
- mixup: Beyond Empirical Risk Minimization <sub>(paper)
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features <sub>(paper)
Trained Models and Performance Table
Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256).
classifiaction training settings for media and large models | |
---|---|
Details | RandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu |
Examples | ResNet50 |
Note | The newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted |
classifiaction training settings for mobile/small models | |
---|---|
Details | RandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu |
Examples | ShuffleNetV2 |
Typical Training & Testing Tips:
Small Models
ShuffleNetV2_1x
python -m torch.distributed.launch --nproc_per_node=8 imagenet_mobile.py --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing
python -m torch.distributed.launch --nproc_per_node=2 imagenet_mobile.py -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc
Large Models
SGE-ResNet
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster)
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc
python -m torch.distributed.launch --nproc_per_node=2 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc
WS-ResNet with e-shifted L2 regularizer, e = 1e-3
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2
Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"
Note the following results (old) do not set the bias wd = 0 for large models
Classification
Model | #P | GFLOPs | Top-1 Acc | Top-5 Acc | Download1 | Download2 | log |
---|---|---|---|---|---|---|---|
ShuffleNetV2_1x | 2.28M | 0.151 | 69.6420 | 88.7200 | GoogleDrive | shufflenetv2_1x.log | |
ResNet50 | 25.56M | 4.122 | 76.3840 | 92.9080 | BaiduDrive(zuvx) | GoogleDrive | old_resnet50.log |
SE-ResNet50 | 28.09M | 4.130 | 77.1840 | 93.6720 | |||
SK-ResNet50* | 26.15M | 4.185 | 77.5380 | 93.7000 | BaiduDrive(tfwn) | GoogleDrive | sk_resnet50.log |
BAM-ResNet50 | 25.92M | 4.205 | 76.8980 | 93.4020 | BaiduDrive(z0h3) | GoogleDrive | bam_resnet50.log |
CBAM-ResNet50 | 28.09M | 4.139 | 77.6260 | 93.6600 | BaiduDrive(bram) | GoogleDrive | cbam_resnet50.log |
SGE-ResNet50 | 25.56M | 4.127 | 77.5840 | 93.6640 | BaiduDrive(gxo9) | GoogleDrive | sge_resnet50.log |
ResNet101 | 44.55M | 7.849 | 78.2000 | 93.9060 | BaiduDrive(js5t) | GoogleDrive | old_resnet101.log |
SE-ResNet101 | 49.33M | 7.863 | 78.4680 | 94.1020 | BaiduDrive(j2ox) | GoogleDrive | se_resnet101.log |
SK-ResNet101* | 45.68M | 7.978 | 78.7920 | 94.2680 | BaiduDrive(boii) | GoogleDrive | sk_resnet101.log |
BAM-ResNet101 | 44.91M | 7.933 | 78.2180 | 94.0180 | BaiduDrive(4bw6) | GoogleDrive | bam_resnet101.log |
CBAM-ResNet101 | 49.33M | 7.879 | 78.3540 | 94.0640 | BaiduDrive(syj3) | GoogleDrive | cbam_resnet101.log |
SGE-ResNet101 | 44.55M | 7.858 | 78.7980 | 94.3680 | BaiduDrive(wqn6) | GoogleDrive | sge_resnet101.log |
Here SK-ResNet* is a modified version (for more fair comparison with ResNet backbone here) of original SKNet. The original SKNets perform stronger, and the pytorch version can be referred in pppLang-SKNet.
Detection
Model | #p | GFLOPs | Detector | Neck | AP50:95 (%) | AP50 (%) | AP75 (%) | Download |
---|---|---|---|---|---|---|---|---|
ResNet50 | 23.51M | 88.0 | Faster RCNN | FPN | 37.5 | 59.1 | 40.6 | GoogleDrive |
SGE-ResNet50 | 23.51M | 88.1 | Faster RCNN | FPN | 38.7 | 60.8 | 41.7 | GoogleDrive |
ResNet50 | 23.51M | 88.0 | Mask RCNN | FPN | 38.6 | 60.0 | 41.9 | GoogleDrive |
SGE-ResNet50 | 23.51M | 88.1 | Mask RCNN | FPN | 39.6 | 61.5 | 42.9 | GoogleDrive |
ResNet50 | 23.51M | 88.0 | Cascade RCNN | FPN | 41.1 | 59.3 | 44.8 | GoogleDrive |
SGE-ResNet50 | 23.51M | 88.1 | Cascade RCNN | FPN | 42.6 | 61.4 | 46.2 | GoogleDrive |
ResNet101 | 42.50M | 167.9 | Faster RCNN | FPN | 39.4 | 60.7 | 43.0 | GoogleDrive |
SE-ResNet101 | 47.28M | 168.3 | Faster RCNN | FPN | 40.4 | 61.9 | 44.2 | GoogleDrive |
SGE-ResNet101 | 42.50M | 168.1 | Faster RCNN | FPN | 41.0 | 63.0 | 44.3 | GoogleDrive |
ResNet101 | 42.50M | 167.9 | Mask RCNN | FPN | 40.4 | 61.6 | 44.2 | GoogleDrive |
SE-ResNet101 | 47.28M | 168.3 | Mask RCNN | FPN | 41.5 | 63.0 | 45.3 | GoogleDrive |
SGE-ResNet101 | 42.50M | 168.1 | Mask RCNN | FPN | 42.1 | 63.7 | 46.1 | GoogleDrive |
ResNet101 | 42.50M | 167.9 | Cascade RCNN | FPN | 42.6 | 60.9 | 46.4 | GoogleDrive |
SE-ResNet101 | 47.28M | 168.3 | Cascade RCNN | FPN | 43.4 | 62.2 | 47.2 | GoogleDrive |
SGE-ResNet101 | 42.50M | 168.1 | Cascade RCNN | FPN | 44.4 | 63.2 | 48.4 | GoogleDrive |
Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"
Note that the following models are with bias wd = 0.
Classification
Model | Top-1 | Download |
---|---|---|
WS-ResNet50 | 76.74 | GoogleDrive |
WS-ResNet50(e = 1e-3) | 76.86 | GoogleDrive |
WS-ResNet101 | 78.07 | GoogleDrive |
WS-ResNet101(e = 1e-6) | 78.29 | GoogleDrive |
WS-ResNeXt50(e = 1e-3) | 77.88 | GoogleDrive |
WS-ResNeXt101(e = 1e-3) | 78.80 | GoogleDrive |
WS-DenseNet201(e = 1e-8) | 77.59 | GoogleDrive |
WS-ShuffleNetV1(e = 1e-8) | 68.09 | GoogleDrive |
WS-ShuffleNetV2(e = 1e-8) | 69.70 | GoogleDrive |
WS-MobileNetV1(e = 1e-6) | 73.60 | GoogleDrive |
Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"
To appear
Citation
If you find our related works useful in your research, please consider citing the paper:
@inproceedings{li2019selective,
title={Selective Kernel Networks},
author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@inproceedings{li2019spatial,
title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:1905.09646},
year={2019}
}
@inproceedings{li2019understanding,
title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
author={Li, Xiang and Chen, Shuo and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}
@inproceedings{li2019generalization,
title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}