

Networks-Beyond-Attention (NBA)

A list of modern (convolutional) network architectures for vision. Note that we only list the works based on convolution, modulation or other variants that emerge most recently. Please refer to other more comprehensive lists about networks using attention or MLP-style designs.

Since it is a new trend, so feel free to submit a pull request or raise an issue if you find any missed papers!



Image Classification

<p> <font size=3><b>On the Connection between Local Attention and Dynamic Depth-wise Convolution. ICLR 2022.</b></font> <br> <font size=2>Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang.</font> <br> Release date: <b>8 June 2021</b>. <br> <a href='https://arxiv.org/abs/2106.04263'>[paper]</a> <a href='https://github.com/Atten4Vis/DemystifyLocalViT'>[code]</a> </p> <p> <font size=3><b>MetaFormer Is Actually What You Need for Vision. CVPR 2022.</b></font> <br> <font size=2>Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan.</font> <br> Release date: <b>22 Nov 2021</b>. <br> <a href='https://arxiv.org/abs/2111.11418'>[paper]</a> <a href='https://github.com/sail-sg/poolformer'>[code]</a> </p> <p> <font size=3><b>A ConvNet for the 2020s. CVPR 2022.</b></font> <br> <font size=2>Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.</font> <br> Release date: <b>10 Jan 2022</b>. <br> <a href='https://arxiv.org/pdf/2201.03545'>[paper]</a> <a href='https://github.com/facebookresearch/ConvNeXt'>[code]</a> </p> <p> <font size=3><b>Visual Attention Network. arXiv 2022.</b></font> <br> <font size=2>Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.</font> <br> Release date: <b>20 Feb 2022</b>. <br> <a href='https://arxiv.org/abs/2202.09741'>[paper]</a> <a href='https://github.com/Visual-Attention-Network'>[code]</a> </p> <p> <font size=3><b>Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. CVPR 2022.</b></font> <br> <font size=2>Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun.</font> <br> Release date: <b>13 Mar 2022</b>. <br> <a href='https://arxiv.org/abs/2203.06717'>[paper]</a> <a href='https://github.com/megvii-research/RepLKNet'>[code]</a> </p> <p> <font size=3><b>Focal Modulation Networks. NeurIPS 2022.</b></font> <br> <font size=2>Jianwei Yang, Chunyuan Li, Xiyang Dai, Jianfeng Gao.</font> <br> Release date: <b>22 Mar 2022</b>. <br> <a href='https://arxiv.org/pdf/2203.11926'>[paper]</a> <a href='https://github.com/microsoft/FocalNet'>[code]</a> </p> <p> <font size=3><b>More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity. arXiv 2022.</b></font> <br> <font size=2>Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang.</font> <br> Release date: <b>7 July 2022</b>. <br> <a href='https://arxiv.org/abs/2207.03620'>[paper]</a> <a href='https://github.com/VITA-Group/SLaK'>[code]</a> </p> <p> <font size=3><b>HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions. NeurIPS 2022.</b></font> <br> <font size=2>Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu.</font> <br> Release date: <b>28 July 2022</b>. <br> <a href='https://arxiv.org/abs/2207.14284'>[paper]</a> <a href='https://github.com/raoyongming/HorNet'>[code]</a> </p> <p> <font size=3><b>Efficient Multi-order Gated Aggregation Network. arXiv 2022.</b></font> <br> <font size=2>Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li.</font> <br> Release date: <b>7 Nov 2022</b>. <br> <a href='https://arxiv.org/abs/2211.03295'>[paper]</a> <a href=''>[code]</a> </p> <p> <font size=3><b>InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. arXiv 2022.</b></font> <br> <font size=2>Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao.</font> <br> Release date: <b>10 Nov 2022</b>. <br> <a href='https://arxiv.org/abs/2211.05778v2'>[paper]</a> <a href='https://github.com/OpenGVLab/InternImage'>[code]</a> </p> <p> <font size=3><b>Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition. arXiv 2022.</b></font> <br> <font size=2>Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, Jiashi Feng.</font> <br> Release date: <b>22 Nov 2022</b>. <br> <a href='https://arxiv.org/abs/2211.11943'>[paper]</a> <a href='https://github.com/HVision-NKU/Conv2Former'>[code]</a> </p> <p> <font size=3><b>A Close Look at Spatial Modeling: From Attention to Convolution. arXiv 2022.</b></font> <br> <font size=2>Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao, Jie Fu, Yun Fu.</font> <br> Release date: <b>23 Dec 2022</b>. <br> <a href='https://arxiv.org/abs/2212.12552'>[paper]</a> <a href='https://github.com/ma-xu/FCViT'>[code]</a> </p> <p> <font size=3><b>ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. arXiv 2023.</b></font> <br> <font size=2>Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.</font> <br> Release date: <b>2 Jan 2023</b>. <br> <a href='https://arxiv.org/abs/2301.00808'>[paper]</a> <a href='https://github.com/facebookresearch/ConvNeXt-V2'>[code]</a> </p>

Image Segmentation

<p> <font size=3><b>SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. NeurIPS 2022.</b></font> <br> <font size=2>Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, Shi-Min Hu.</font> <br> Release date: <b>18 Sep 2022</b>. <br> <a href='https://arxiv.org/abs/2209.08575v1'>[paper]</a> <a href='https://github.com/Visual-Attention-Network/SegNeXt'>[code]</a> </p>

3D Understanding

<p> <font size=3><b>Scaling up Kernels in 3D CNNs. arXiv 2022.</b></font> <br> <font size=2>Yukang Chen, Jianhui Liu, Xiaojuan Qi, Xiangyu Zhang, Jian Sun, Jiaya Jia.</font> <br> Release date: <b>21 June 2022</b>. <br> <a href='https://arxiv.org/abs/2206.10555'>[paper]</a> <a href='https://github.com/dvlab-research/LargeKernel3D'>[code]</a> </p> <p> <font size=3><b>Long Range Pooling for 3D Large-Scale Scene Understanding. arXiv 2023.</b></font> <br> <font size=2>Xiang-Li Li, Meng-Hao Guo, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu</font> <br> Release date: <b>17 Jan 2023</b>. <br> <a href='https://arxiv.org/abs/2301.06962'>[paper]</a> <a href='https://arxiv.org/abs/2301.06962'>[code]</a> </p>


<p> <font size=3><b>LKD-Net: Large Kernel Convolution Network for Single Image Dehazing. arXiv 2022.</b></font> <br> <font size=2>Pinjun Luo, Guoqiang Xiao, Xinbo Gao, Song Wu.</font> <br> Release date: <b>5 Sep 2022</b>. <br> <a href='https://arxiv.org/abs/2209.01788'>[paper]</a> <a href='https://github.com/SWU-CS-MediaLab/LKD-Net'>[code]</a> </p>

Related Awesome Paper Lists

Awesome Visual-Transformer: Awesome Visual-Transformer.

Ultimate-Awesome-Transformer-Attention: Ultimate-Awesome-Transformer-Attention.

Transformer-in-Vision: Transformer-in-Vision.


The list format follows awesome-detection-transformer.