Home

Awesome

Stitched ViTs are Flexible Vision Backbones

This is the official PyTorch implementation for Stitched ViTs are Flexible Vision Backbones.

By Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, and Bohan Zhuang.

framework

We adapt the framework of stitchable neural networks (SN-Net) into downstream dense prediction tasks. Compared to SNNetv1, the new framework consistently improves the performance at low FLOPs while maintaining competitive performance at high FLOPs across different datasets, thus obtaining a better Pareto frontier (highlighted in lines).

📰 News

💪 Getting Started

For image classification on ImageNet-1K, please refer to classification.

For semantic segmentation on ADE20K and COCO-Stuff-10K, please refer to segmentation.

For depth estimation on NYUv2, please refer to depth_estimation.

🪄 Gradio Demo for Segmentation

First, install gradio by

pip install gradio

Next, install the required packages at segmentation, then run the gradio demo by

cd segmentation
python demo/video_demo_gradio.py --config [path/to/config] --checkpoint [path/to/checkpoint]

gradio_demo

✨ Results

Understand the figures:

Image Classification on ImageNet-1K

framework

Semantic Segmentation on ADE20K and COCO-Stuff-10K

ADE20KCOCO-Stuff-10K

Depth Estimation on NYUv2

<figure> <center> <figcaption>Stitching DeiT3-S and DeiT3-L based on DPT.</figcaption></center> <img src=".github/depth_estimation.png"> </figure>

Object Detection and Instance Segmentation on COCO-2017

<figure> <center> <figcaption>Stitching DeiT3-S and DeiT3-L based on Mask R-CNN/ViTDet.</figcaption></center> <img src=".github/coco_res.jpg"> </figure>

Training Efficiency Comparison

framework

🚧 TODO List

✍ Citation

If you use SN-Netv2 in your research, please consider the following BibTeX entry and giving us a star 🌟.

@article{pan2023snnetv2,
  title={Stitched ViTs are Flexible Vision Backbones},
  author={Pan, Zizheng and Liu, Jing and He, Haoyu and Cai, Jianfei and Zhuang, Bohan},
  journal={arXiv},
  year={2023}
}

If you find the code useful, please also consider the following BibTeX entry

@inproceedings{pan2023snnetv1,
  title     = {Stitchable Neural Networks},
  author    = {Pan, Zizheng and Cai, Jianfei and Zhuang, Bohan},
  booktitle = {CVPR},
  year      = {2023},
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.