Home

Awesome

PytorchInsight

This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.

This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.

Including Papers (to be updated):

Attention Models

Non-Attention Models


Trained Models and Performance Table

Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256).

classifiaction training settings for media and large models
DetailsRandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu
ExamplesResNet50
NoteThe newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted
classifiaction training settings for mobile/small models
DetailsRandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu
ExamplesShuffleNetV2

Typical Training & Testing Tips:

Small Models

ShuffleNetV2_1x

python -m torch.distributed.launch --nproc_per_node=8 imagenet_mobile.py --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing

python -m torch.distributed.launch --nproc_per_node=2 imagenet_mobile.py -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc

Large Models

SGE-ResNet

python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training

python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ \ 
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \ 
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster) 
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc

python -m torch.distributed.launch --nproc_per_node=2 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc

WS-ResNet with e-shifted L2 regularizer, e = 1e-3

python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2

Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"

Note the following results (old) do not set the bias wd = 0 for large models

Classification

Model#PGFLOPsTop-1 AccTop-5 AccDownload1Download2log
ShuffleNetV2_1x2.28M0.15169.642088.7200GoogleDriveshufflenetv2_1x.log
ResNet5025.56M4.12276.384092.9080BaiduDrive(zuvx)GoogleDriveold_resnet50.log
SE-ResNet5028.09M4.13077.184093.6720
SK-ResNet50*26.15M4.18577.538093.7000BaiduDrive(tfwn)GoogleDrivesk_resnet50.log
BAM-ResNet5025.92M4.20576.898093.4020BaiduDrive(z0h3)GoogleDrivebam_resnet50.log
CBAM-ResNet5028.09M4.13977.626093.6600BaiduDrive(bram)GoogleDrivecbam_resnet50.log
SGE-ResNet5025.56M4.12777.584093.6640BaiduDrive(gxo9)GoogleDrivesge_resnet50.log
ResNet10144.55M7.84978.200093.9060BaiduDrive(js5t)GoogleDriveold_resnet101.log
SE-ResNet10149.33M7.86378.468094.1020BaiduDrive(j2ox)GoogleDrivese_resnet101.log
SK-ResNet101*45.68M7.97878.792094.2680BaiduDrive(boii)GoogleDrivesk_resnet101.log
BAM-ResNet10144.91M7.93378.218094.0180BaiduDrive(4bw6)GoogleDrivebam_resnet101.log
CBAM-ResNet10149.33M7.87978.354094.0640BaiduDrive(syj3)GoogleDrivecbam_resnet101.log
SGE-ResNet10144.55M7.85878.798094.3680BaiduDrive(wqn6)GoogleDrivesge_resnet101.log

Here SK-ResNet* is a modified version (for more fair comparison with ResNet backbone here) of original SKNet. The original SKNets perform stronger, and the pytorch version can be referred in pppLang-SKNet.

Detection

Model#pGFLOPsDetectorNeckAP50:95 (%)AP50 (%)AP75 (%)Download
ResNet5023.51M88.0Faster RCNNFPN37.559.140.6GoogleDrive
SGE-ResNet5023.51M88.1Faster RCNNFPN38.760.841.7GoogleDrive
ResNet5023.51M88.0Mask RCNNFPN38.660.041.9GoogleDrive
SGE-ResNet5023.51M88.1Mask RCNNFPN39.661.542.9GoogleDrive
ResNet5023.51M88.0Cascade RCNNFPN41.159.344.8GoogleDrive
SGE-ResNet5023.51M88.1Cascade RCNNFPN42.661.446.2GoogleDrive
ResNet10142.50M167.9Faster RCNNFPN39.460.743.0GoogleDrive
SE-ResNet10147.28M168.3Faster RCNNFPN40.461.944.2GoogleDrive
SGE-ResNet10142.50M168.1Faster RCNNFPN41.063.044.3GoogleDrive
ResNet10142.50M167.9Mask RCNNFPN40.461.644.2GoogleDrive
SE-ResNet10147.28M168.3Mask RCNNFPN41.563.045.3GoogleDrive
SGE-ResNet10142.50M168.1Mask RCNNFPN42.163.746.1GoogleDrive
ResNet10142.50M167.9Cascade RCNNFPN42.660.946.4GoogleDrive
SE-ResNet10147.28M168.3Cascade RCNNFPN43.462.247.2GoogleDrive
SGE-ResNet10142.50M168.1Cascade RCNNFPN44.463.248.4GoogleDrive

Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"

Note that the following models are with bias wd = 0.

Classification

ModelTop-1Download
WS-ResNet5076.74GoogleDrive
WS-ResNet50(e = 1e-3)76.86GoogleDrive
WS-ResNet10178.07GoogleDrive
WS-ResNet101(e = 1e-6)78.29GoogleDrive
WS-ResNeXt50(e = 1e-3)77.88GoogleDrive
WS-ResNeXt101(e = 1e-3)78.80GoogleDrive
WS-DenseNet201(e = 1e-8)77.59GoogleDrive
WS-ShuffleNetV1(e = 1e-8)68.09GoogleDrive
WS-ShuffleNetV2(e = 1e-8)69.70GoogleDrive
WS-MobileNetV1(e = 1e-6)73.60GoogleDrive

Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"

To appear


Citation

If you find our related works useful in your research, please consider citing the paper:

@inproceedings{li2019selective,
  title={Selective Kernel Networks},
  author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

@inproceedings{li2019spatial,
  title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
  author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:1905.09646},
  year={2019}
}

@inproceedings{li2019understanding,
  title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
  author={Li, Xiang and Chen, Shuo and Yang, Jian},
  journal={arXiv preprint arXiv:},
  year={2019}
}

@inproceedings{li2019generalization,
  title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
  author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:},
  year={2019}
}