Home

Awesome

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

中文版

ParC-Net ECCV 2022

This reposity was named EdgeFormer, which is changed to ParC-Net, as "Former" indicates that the model is some variant of transformer.

Official PyTorch implementation of ParC-Net


<p align="center"> <img src="https://s1.ax1x.com/2022/07/27/vSRJne.png" width=100% height=100% class="center"> </p> <p align="center"> <img src="https://s1.ax1x.com/2022/07/27/vSR8XD.png" width=60% height=60% class="center"> </p>

ParC-ConvNext, ParC-MobilenetV2 and ParC-Resnet50 have been uploaded. Please find in ParC-ConvNets

Introduction

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5× parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance.

ParC block

<p align="center"> <img src="https://s1.ax1x.com/2022/07/27/vSRt7d.png" width=60% height=60% class="center"> </p>

Position aware circular convolution

<p align="center"> <img src="https://s1.ax1x.com/2022/07/27/vSRY0H.png" width=60% height=60% class="center"> </p>

Experimental results

EdgeFormer-S

Tasksperformance#paramspretrained models
Classification78.6 (Top1 acc)5.0model
Detection28.8 (mAP)5.2model
Segmentation79.7 (mIOU)5.8model

Inference speed

We deploy the proposed EdgeFormer and baseline on widely used low power chip Rockchip RK3288 and DP chip for comparison. DP is the code name of a in house unpublished low power neural network processor that highly optimizes the convolutions. We use ONNX [1] and MNN to port these models to RK3288 and DP chip and time each model for 100 iterations to measure the average inference speed.

Models#params (M)Madds (M)RK3288 inference speed (ms)DP (ms)Top1 acc
MobileViT-S5.6201045736878.4
ParC-Net-S5.0 (-11%)1740 (-13%)353 (+23%)98 (3.77x)78.6 (+0.2%)

Applying Edgeformer designs on various lightweight backbones

Classification experiments. CPU used here is Xeon E5-2680 v4. *Authors of EdgeViT do not clarify the type of CPU used in their paper. ** We train ResNet50 with training strategy proposed in ConvNext. ResNet50 achieves 79.1 top 1 accuracy, which is much higher than 76.5 the accuracy reported in the original paper.

Models# paramsMaddsDevicesSpeed(ms)Top1 accSource
MobileViT-S5.6 M2.0GRK328845778.4ICLR 22
ParC-Net-S5.0 M1.7GRK328835378.6Ours
MobileViT-S5.6 M2.0GDP36878.4ICLR 22
ParC-Net-S5.0 M1.7GDP9878.6Ours
ResNet5026 M2.1GCPU9879.1**CVPR 22 new training setting
ParC-ResNet5024 M2.0GCPU9879.6Ours
MobileNetV23.5 M0.3GCPU2470.2CVPR 18
ParC-MobileNetV23.5 M0.3GCPU2771.1Ours
ConvNext-XT7.4 M0.6GCPU4777.5CVPR 22
ParC-ConvNext-XT7.4 M0.6GCPU4878.3Ours
EdgeViT-XS6.7 M1.1GCPU*54*77.5Arxiv 22/05

Detection experiments

Models# paramsAP boxAP50 boxAP75 boxAP maskAP50 maskAP75 mask
ConvNext-XT-47.265.651.441.063.044.2
ParC-ConvNext-XT-47.766.252.041.563.644.6
ResNet-50-47.565.651.641.163.144.6
ParC-ResNet-50-48.166.452.341.864.045.1
MobileNetv2-43.761.947.637.959.140.8
ParC-MobileNetv2-44.362.747.839.060.342.1

Segmentation experiments

Models# paramsmIoUmACCaACC
ConvNext-XT-42.1754.1879.72
ParC-ConvNext-XT-42.3254.4880.30
ResNet-50-42.2752.9179.88
ParC-ResNet-50-43.8554.6680.43
MobileNetv2-32.8048.7574.42
ParC-MobileNetv2-35.1349.6475.73

ConvNext block and ConvNext-GCC block

<p align="center"> <img src="https://s1.ax1x.com/2022/05/16/OWxqNd.png" width=40% height=40% class="center"> </p>

In terms of designing a pure ConvNet via learning from ViTs, our proposed ParC-Net is most closely related to a parallel work ConvNext. By comparing ParC-Net with Convnext, we notice that their improvements are different and complementary. To verify this point, we build a combination network, where ParC blocks are used to replace several ConvNext blocks in the end of last two stages. Experiment results show that the replacement signifcantly improves classification accuracy, while slightly decreases the number of parameters. Results on ResNet50, MobileNetV2 and ConvNext-T shows that models which focus on optimizing FLOPs-accuracy trade-offs can also benefit from our ParC-Net designs. Corresponding code will be released soon.

Installation

We implement the ParC-Net with PyTorch-1.9.0, CUDA=11.1.

PiP

The environment can be build in the local python environment using the below command:

pip install -r requirements.txt

Dokcer

A docker image containing the environment will be provided soon.

Training

Training settings are listed in yaml files (./config/classification/xxx/xxxx.yaml, ./config/detection/xxx/xxxx.yaml, ./config/segmentation/xxx/xxxx.yaml )

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main_train.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/detection/ssd_edgeformer_s.yaml

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/segmentation/deeplabv3_edgeformer_s.yaml

Evaluation

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_cls.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml --model.classification.pretrained ./pretrained_models/classification/checkpoint_ema_avg.pt

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_det.py --common.config-file ./config/detection/edgeformer/ssd_edgeformer_s.yaml --model.detection.pretrained ./pretrained_models/detection/checkpoint_ema_avg.pt --evaluation.detection.mode validation_set --evaluation.detection.resize-input-images

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_seg.py --common.config-file ./config/detection/edgeformer/deeplabv3_edgeformer_s.yaml --model.segmentation.pretrained ./pretrained_models/segmentation/checkpoint_ema_avg.pt --evaluation.segmentation.mode validation_set --evaluation.segmentation.resize-input-images

Acknowledgement

We thank authors of MobileVit for sharing their code. We implement our EdgeFormer based on their source code. If you find this code is helpful in your research, please consider citing our paper and MobileVit

@inproceedings{zhang2022parcnet,
  title={ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer},
  author={Zhang, Haokui and Hu, Wenze and Wang, Xiaoyu},
  booktitle={European Conference on Computer Vision},
  pages={},
  year={2022}
}
@inproceedings{mehta2021mobilevit,
  title={Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={ICLR},
  year={2022}
}