Home

Awesome

Conv2Former

Our code is based on timm and ConvNeXt.

More code will be released soon.

Results

Training on ImageNet-1k

ModelParametersFLOPsImage resolutionTop 1 Acc.Model File
Conv2Former-N15M2.2G22481.5%Comming soom
SwinT-T28M4.5G22481.5%-
ConvNeXt-T29M4.5G22482.1%-
Conv2Former-T27M4.4G22483.2%Comming soom
SwinT-S50M8.7G22483.0%-
ConvNeXt-S50M8.7G22483.1%-
Conv2Former-S50M8.7G22484.1%Comming soom
RepLKNet-31B79M15.3G22483.5%-
SwinT-B88M15.4G22483.5%-
ConvNeXt-B89M15.4G22483.8%-
FocalNet-B89M15.4G22483.9%-
Conv2Former-B90M15.9G22484.4%Comming soom

Pre-Training on ImageNet-22k and Finetining on ImageNet-1k

ModelParametersFLOPsImage resolutionTop 1 Acc.Model File
ConvNeXt-S50M8.7G22484.6%-
Conv2Former-S50M8.7G22484.9%Comming soom
SwinT-B88M15.4G22485.2%-
ConvNeXt-B89M15.4G22485.8%-
Conv2Former-B90M15.9G22486.2%Comming soom
SwinT-B88M47.0G38486.4%-
ConvNeXt-B89M45.1G38486.8%-
Conv2Former-B90M46.7G38487.0%Comming soom
SwinT-L197M34.5G22486.3%-
ConvNeXt-L198M34.4G22486.6%-
Conv2Former-L199M36.0G22487.0%Comming soom
EffNet-V2-XL208M94G48087.3%-
SwinT-L197M104G38487.3%-
ConvNeXt-L198M101G38487.5%-
CoAtNet-3168M107G38487.6%-
Conv2Former-L199M106G38487.7%Comming soom

Reference

You may want to cite:

@article{hou2022conv2former,
  title={Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition},
  author={Hou, Qibin and Lu, Cheng-Ze and Cheng, Ming-Ming and Feng, Jiashi},
  journal={arXiv preprint arXiv:2211.11943},
  year={2022}
}

@inproceedings{liu2022convnet,
      title={A ConvNet for the 2020s}, 
      author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
      booktitle=CVPR,
      year={2022}
}

@inproceedings{liu2021swin,
  title={Swin transformer: Hierarchical vision transformer using shifted windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle=ICCV,
  year={2021}
}

@inproceedings{tan2021efficientnetv2,
  title={Efficientnetv2: Smaller models and faster training},
  author={Tan, Mingxing and Le, Quoc},
  booktitle=ICML,
  pages={10096--10106},
  year={2021},
  organization={PMLR}
}

@misc{focalmnet,
  author = {Yang, Jianwei and Li, Chunyuan and Gao, Jianfeng},
  title = {Focal Modulation Networks},
  publisher = {arXiv},
  year = {2022},
}

@article{dai2021coatnet,
  title={Coatnet: Marrying convolution and attention for all data sizes},
  author={Dai, Zihang and Liu, Hanxiao and Le, Quoc and Tan, Mingxing},
  journal=NIPS,
  volume={34},
  year={2021}
}

@inproceedings{replknet,
  author = {Ding, Xiaohan and Zhang, Xiangyu and Zhou, Yizhuang and Han, Jungong and Ding, Guiguang and Sun, Jian},
  title = {Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs},
  booktitle=CVPR,
  year = {2022},
}