Home

Awesome

OneFormer: One Transformer to Rule Universal Image Segmentation

Framework: PyTorch Open In Colab HuggingFace space HuggingFace transformers YouTube License

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Jitesh Jain, Jiachen Li<sup></sup>, MangTik Chiu<sup></sup>, Ali Hassani, Nikita Orlov, Humphrey Shi

<sup></sup> Equal Contribution

[Project Page] [arXiv] [pdf] [BibTeX]

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.

<img src="images/teaser.png" width="100%"/>

Features

OneFormer

Contents

  1. News
  2. Installation Instructions
  3. Dataset Preparation
  4. Execution Instructions
  5. Results
  6. Citation

News

Installation Instructions

Dataset Preparation

Execution Instructions

Training

Evaluation

Demo

Results

Results

ADE20K

MethodBackboneCrop SizePQAPmIoU <br> (s.s)mIoU <br> (ms+flip)#paramsconfigCheckpoint
OneFormerSwin-L<sup></sup>640×64049.835.957.057.7219Mconfigmodel
OneFormerSwin-L<sup></sup>896×89651.137.657.458.3219Mconfigmodel
OneFormerSwin-L<sup></sup>1280×128051.437.857.057.7219Mconfigmodel
OneFormerConvNeXt-L<sup></sup>640×64050.036.256.657.4220Mconfigmodel
OneFormerDiNAT-L<sup></sup>640×64050.536.058.358.4223Mconfigmodel
OneFormerDiNAT-L<sup></sup>896×89651.236.858.158.6223Mconfigmodel
OneFormerDiNAT-L<sup></sup>1280×128051.537.158.358.7223Mconfigmodel
OneFormer (COCO-Pretrained)DiNAT-L<sup></sup>1280×128053.440.258.458.8223Mconfigmodel | pretrained
OneFormerConvNeXt-XL<sup></sup>640×64050.136.357.458.8372Mconfigmodel

Cityscapes

MethodBackbonePQAPmIoU <br> (s.s)mIoU <br> (ms+flip)#paramsconfigCheckpoint
OneFormerSwin-L<sup></sup>67.245.683.084.4219Mconfigmodel
OneFormerConvNeXt-L<sup></sup>68.546.583.084.0220Mconfigmodel
OneFormer (Mapillary Vistas-Pretrained)ConvNeXt-L<sup></sup>70.148.784.685.2220Mconfigmodel | pretrained
OneFormerDiNAT-L<sup></sup>67.645.683.184.0223Mconfigmodel
OneFormerConvNeXt-XL<sup></sup>68.446.783.684.6372Mconfigmodel
OneFormer (Mapillary Vistas-Pretrained)ConvNeXt-XL<sup></sup>69.748.984.585.8372Mconfigmodel | pretrained

COCO

MethodBackbonePQPQ<sup>Th</sup>PQ<sup>St</sup>APmIoU#paramsconfigCheckpoint
OneFormerSwin-L<sup></sup>57.964.448.049.067.4219Mconfigmodel
OneFormerDiNAT-L<sup></sup>58.064.348.449.268.1223Mconfigmodel

Mapillary Vistas

MethodBackbonePQmIoU <br> (s.s)mIoU <br> (ms+flip)#paramsconfigCheckpoint
OneFormerSwin-L<sup></sup>46.762.964.1219Mconfigmodel
OneFormerConvNeXt-L<sup></sup>47.963.263.8220Mconfigmodel
OneFormerDiNAT-L<sup></sup>47.864.064.9223Mconfigmodel

Citation

If you found OneFormer useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{jain2023oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={CVPR}, 
      year={2023}
    }

Acknowledgement

We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.