Awesome

Networks-Beyond-Attention (NBA)

A list of modern (convolutional) network architectures for vision. Note that we only list the works based on convolution, modulation or other variants that emerge most recently. Please refer to other more comprehensive lists about networks using attention or MLP-style designs.

Since it is a new trend, so feel free to submit a pull request or raise an issue if you find any missed papers!

Papers

Image Classification

On the Connection between Local Attention and Dynamic Depth-wise Convolution. ICLR 2022. Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang. Release date: 8 June 2021. <a href='https://arxiv.org/abs/2106.04263'>[paper]</a> <a href='https://github.com/Atten4Vis/DemystifyLocalViT'>[code]</a> MetaFormer Is Actually What You Need for Vision. CVPR 2022. Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan. Release date: 22 Nov 2021. <a href='https://arxiv.org/abs/2111.11418'>[paper]</a> <a href='https://github.com/sail-sg/poolformer'>[code]</a> A ConvNet for the 2020s. CVPR 2022. Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. Release date: 10 Jan 2022. <a href='https://arxiv.org/pdf/2201.03545'>[paper]</a> <a href='https://github.com/facebookresearch/ConvNeXt'>[code]</a> Visual Attention Network. arXiv 2022. Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. Release date: 20 Feb 2022. <a href='https://arxiv.org/abs/2202.09741'>[paper]</a> <a href='https://github.com/Visual-Attention-Network'>[code]</a> Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. CVPR 2022. Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun. Release date: 13 Mar 2022. <a href='https://arxiv.org/abs/2203.06717'>[paper]</a> <a href='https://github.com/megvii-research/RepLKNet'>[code]</a> Focal Modulation Networks. NeurIPS 2022. Jianwei Yang, Chunyuan Li, Xiyang Dai, Jianfeng Gao. Release date: 22 Mar 2022. <a href='https://arxiv.org/pdf/2203.11926'>[paper]</a> <a href='https://github.com/microsoft/FocalNet'>[code]</a> More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity. arXiv 2022. Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang. Release date: 7 July 2022. <a href='https://arxiv.org/abs/2207.03620'>[paper]</a> <a href='https://github.com/VITA-Group/SLaK'>[code]</a> HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions. NeurIPS 2022. Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu. Release date: 28 July 2022. <a href='https://arxiv.org/abs/2207.14284'>[paper]</a> <a href='https://github.com/raoyongming/HorNet'>[code]</a> Efficient Multi-order Gated Aggregation Network. arXiv 2022. Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li. Release date: 7 Nov 2022. <a href='https://arxiv.org/abs/2211.03295'>[paper]</a> <a href=''>[code]</a> InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. arXiv 2022. Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao. Release date: 10 Nov 2022. <a href='https://arxiv.org/abs/2211.05778v2'>[paper]</a> <a href='https://github.com/OpenGVLab/InternImage'>[code]</a> Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition. arXiv 2022. Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, Jiashi Feng. Release date: 22 Nov 2022. <a href='https://arxiv.org/abs/2211.11943'>[paper]</a> <a href='https://github.com/HVision-NKU/Conv2Former'>[code]</a> A Close Look at Spatial Modeling: From Attention to Convolution. arXiv 2022. Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao, Jie Fu, Yun Fu. Release date: 23 Dec 2022. <a href='https://arxiv.org/abs/2212.12552'>[paper]</a> <a href='https://github.com/ma-xu/FCViT'>[code]</a> ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. arXiv 2023. Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. Release date: 2 Jan 2023. <a href='https://arxiv.org/abs/2301.00808'>[paper]</a> <a href='https://github.com/facebookresearch/ConvNeXt-V2'>[code]</a>

Image Segmentation

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. NeurIPS 2022. Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, Shi-Min Hu. Release date: 18 Sep 2022. <a href='https://arxiv.org/abs/2209.08575v1'>[paper]</a> <a href='https://github.com/Visual-Attention-Network/SegNeXt'>[code]</a>

3D Understanding

Scaling up Kernels in 3D CNNs. arXiv 2022. Yukang Chen, Jianhui Liu, Xiaojuan Qi, Xiangyu Zhang, Jian Sun, Jiaya Jia. Release date: 21 June 2022. <a href='https://arxiv.org/abs/2206.10555'>[paper]</a> <a href='https://github.com/dvlab-research/LargeKernel3D'>[code]</a> Long Range Pooling for 3D Large-Scale Scene Understanding. arXiv 2023. Xiang-Li Li, Meng-Hao Guo, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu Release date: 17 Jan 2023. <a href='https://arxiv.org/abs/2301.06962'>[paper]</a> <a href='https://arxiv.org/abs/2301.06962'>[code]</a>

Others

LKD-Net: Large Kernel Convolution Network for Single Image Dehazing. arXiv 2022. Pinjun Luo, Guoqiang Xiao, Xinbo Gao, Song Wu. Release date: 5 Sep 2022. <a href='https://arxiv.org/abs/2209.01788'>[paper]</a> <a href='https://github.com/SWU-CS-MediaLab/LKD-Net'>[code]</a>

Related Awesome Paper Lists

Awesome Visual-Transformer: Awesome Visual-Transformer.

Ultimate-Awesome-Transformer-Attention: Ultimate-Awesome-Transformer-Attention.

Transformer-in-Vision: Transformer-in-Vision.

Acknowledgement

The list format follows awesome-detection-transformer.