Awesome
Conv2Former
Our code is based on timm and ConvNeXt.
More code will be released soon.
Results
Training on ImageNet-1k
Model | Parameters | FLOPs | Image resolution | Top 1 Acc. | Model File |
---|---|---|---|---|---|
Conv2Former-N | 15M | 2.2G | 224 | 81.5% | Comming soom |
SwinT-T | 28M | 4.5G | 224 | 81.5% | - |
ConvNeXt-T | 29M | 4.5G | 224 | 82.1% | - |
Conv2Former-T | 27M | 4.4G | 224 | 83.2% | Comming soom |
SwinT-S | 50M | 8.7G | 224 | 83.0% | - |
ConvNeXt-S | 50M | 8.7G | 224 | 83.1% | - |
Conv2Former-S | 50M | 8.7G | 224 | 84.1% | Comming soom |
RepLKNet-31B | 79M | 15.3G | 224 | 83.5% | - |
SwinT-B | 88M | 15.4G | 224 | 83.5% | - |
ConvNeXt-B | 89M | 15.4G | 224 | 83.8% | - |
FocalNet-B | 89M | 15.4G | 224 | 83.9% | - |
Conv2Former-B | 90M | 15.9G | 224 | 84.4% | Comming soom |
Pre-Training on ImageNet-22k and Finetining on ImageNet-1k
Model | Parameters | FLOPs | Image resolution | Top 1 Acc. | Model File |
---|---|---|---|---|---|
ConvNeXt-S | 50M | 8.7G | 224 | 84.6% | - |
Conv2Former-S | 50M | 8.7G | 224 | 84.9% | Comming soom |
SwinT-B | 88M | 15.4G | 224 | 85.2% | - |
ConvNeXt-B | 89M | 15.4G | 224 | 85.8% | - |
Conv2Former-B | 90M | 15.9G | 224 | 86.2% | Comming soom |
SwinT-B | 88M | 47.0G | 384 | 86.4% | - |
ConvNeXt-B | 89M | 45.1G | 384 | 86.8% | - |
Conv2Former-B | 90M | 46.7G | 384 | 87.0% | Comming soom |
SwinT-L | 197M | 34.5G | 224 | 86.3% | - |
ConvNeXt-L | 198M | 34.4G | 224 | 86.6% | - |
Conv2Former-L | 199M | 36.0G | 224 | 87.0% | Comming soom |
EffNet-V2-XL | 208M | 94G | 480 | 87.3% | - |
SwinT-L | 197M | 104G | 384 | 87.3% | - |
ConvNeXt-L | 198M | 101G | 384 | 87.5% | - |
CoAtNet-3 | 168M | 107G | 384 | 87.6% | - |
Conv2Former-L | 199M | 106G | 384 | 87.7% | Comming soom |
Reference
You may want to cite:
@article{hou2022conv2former,
title={Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition},
author={Hou, Qibin and Lu, Cheng-Ze and Cheng, Ming-Ming and Feng, Jiashi},
journal={arXiv preprint arXiv:2211.11943},
year={2022}
}
@inproceedings{liu2022convnet,
title={A ConvNet for the 2020s},
author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
booktitle=CVPR,
year={2022}
}
@inproceedings{liu2021swin,
title={Swin transformer: Hierarchical vision transformer using shifted windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle=ICCV,
year={2021}
}
@inproceedings{tan2021efficientnetv2,
title={Efficientnetv2: Smaller models and faster training},
author={Tan, Mingxing and Le, Quoc},
booktitle=ICML,
pages={10096--10106},
year={2021},
organization={PMLR}
}
@misc{focalmnet,
author = {Yang, Jianwei and Li, Chunyuan and Gao, Jianfeng},
title = {Focal Modulation Networks},
publisher = {arXiv},
year = {2022},
}
@article{dai2021coatnet,
title={Coatnet: Marrying convolution and attention for all data sizes},
author={Dai, Zihang and Liu, Hanxiao and Le, Quoc and Tan, Mingxing},
journal=NIPS,
volume={34},
year={2021}
}
@inproceedings{replknet,
author = {Ding, Xiaohan and Zhang, Xiangyu and Zhou, Yizhuang and Han, Jungong and Ding, Guiguang and Sun, Jian},
title = {Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs},
booktitle=CVPR,
year = {2022},
}