Home

Awesome

<div align='center'> <!-- Paper Title --> <h1><strong> (ECCV 2024) ProLab</strong>: <strong>Pro</strong>perty-level <strong>Lab</strong>el Space</h1> <h3><a href="https://arxiv.org/abs/2312.13764">A Semantic Space is Worth 256 Language Descriptions: <br>Make Stronger Segmentation Models with Descriptive Properties</a></h3> <!-- Authors --> <p> <a href="https://lambert-x.github.io/">Junfei Xiao</a><sup>1</sup>, <a href="https://zzzqzhou.github.io/">Ziqi Zhou</a><sup>2</sup>, <a href="https://scholar.google.com/citations?user=tpNZM2YAAAAJ&hl=en">Wenxuan Li</a><sup>1</sup>, <a href="https://voidrank.github.io/">Shiyi Lan</a><sup>3</sup>, <a href="https://meijieru.com/">Jieru Mei</a><sup>1</sup>, <a href="https://chrisding.github.io/">Zhiding Yu</a><sup>3</sup>, <br> <a href="https://bzhao.me/">Bingchen Zhao</a><sup>4</sup>, <a href="https://www.cs.jhu.edu/~ayuille/">Alan Yuille</a><sup>1</sup>, <a href="https://yuyinzhou.github.io/">Yuyin Zhou</a><sup>2</sup>, <a href="https://cihangxie.github.io/">Cihang Xie</a><sup>2</sup> </p> <!-- Institutions --> <p> <sup>1</sup><a href="https://www.jhu.edu/">Johns Hopkins University</a>, <sup>2</sup><a href="https://www.ucsc.edu/">UCSC</a>, <sup>3</sup><a href="https://www.nvidia.com/">NVIDIA</a>, <sup>4</sup><a href="https://www.ed.ac.uk/">University of Edinburgh</a> </p> <!-- Teaser Image --> <img src="images/github_teaser.png" alt="Teaser Image"> </div> <div align="center">

Paper | Property-level Label Space | Model Zoo | Training & Evaluation

</div>

News

Method

method

Emerged Generalization Ability

ProLab models have emerged generalization ability to out-of-domain categories and even unknown categories.

Contents

Getting Started

Our segmentation code is developed on top of MMSegmentation and ViT-Adapter.

Setup

We have two tested environments based on torch 1.9+cuda 11.1+MMSegmentation v0.20.2 and torch 1.13.1+torch11.7+MMSegmentation v0.27.0.

Environment 1 (torch 1.9+cuda 11.1+MMSegmentation v0.20.2)

conda create -n prolab python=3.8
conda activate prolab
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # for Mask2Former
pip install mmsegmentation==0.20.2
pip install -r requirements.txt
cd ops & sh make.sh # compile deformable attention

Environment 2 (torch 1.13.1+cuda 11.7+MMSegmentation v0.27.0)

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # may need modification on the limitation of mmcv version 
pip install mmsegmentation==0.27.0
pip install -r requirements.txt
cd ops & sh make.sh # compile deformable attention

Data Preparation

ADE20K/Cityscapes/COCO Stuff/Pascal Context

Please follow the guidelines in MMSegmentation to download ADE20K, Cityscapes, COCO Stuff and Pascal Context.

BDD

Please visit the official website to download the BDD dataset.

Property-level Label Space

Descriptive Properties and Clustered Embeddings (Ready-to-use)

We provide the retrieved descriptive properties (with GPT-3.5) and property-level labels (language embeddings) .

Descriptive Properties Retrieval (Optional)

We provide generate_descrtiptions.ipynb using GPT 3.5 (API) and LLAMA-2 (local deploy) to retrieve descriptive properties.

Encode Descriptions into Embeddings (Optional)

We also provide generate_embeddings.ipynb to encode and cluster the descriptive properties into embeddings with Sentence Transformer (huggingface, paper) and BAAI-BGE models (huggingface, paper) step-by-step.

Model Zoo

ADE20K

FrameworkBackbonePretrainLr schdCrop SizemIoUConfigCheckpoint
UperNetViT-Adapter-BDeiT-B320k51249.0configGoogle Drive
UperNetViT-Adapter-LBEiT-L160k64058.2configGoogle Drive
UperNetViT-Adapter-LBEiTv2-L80K89658.7configGoogle Drive

COCO-Stuff-164K

FrameworkBackbonePretrainLr schdCrop SizemIoUConfigCheckpoint
UperNetViT-Adapter-BDeiT-B160K51245.4configGoogle Drive

Pascal Context

FrameworkBackbonePretrainLr schdCrop SizemIoUConfigCheckpoint
UperNetViT-Adapter-BDeiT-B160K51258.2configGoogle Drive

Cityscapes

FrameworkBackbonePretrainLr schdCrop SizemIoUConfigCheckpoint
UperNetViT-Adapter-BDeiT-B160K76881.4configGoogle Drive

BDD

FrameworkBackbonePretrainLr schdCrop SizemIoUConfigCheckpoint
UperNetViT-Adapter-BDeiT-B160K76865.7configGoogle Drive

Training & Evaluation

Training

The following example script is to train ViT-Adapter-B + UperNet on ADE20k on a single node with 8 gpus:

sh dist_train.sh configs/ADE20K/upernet_deit_adapter_base_512_320k_ade20k_bge_base.py 8

Evaluation

The following example script is to evaluate ViT-Adapter-B + UperNet on COCO_Stuff val on a single node with 8 gpus:

sh dist_test.sh configs/COCO_Stuff/upernet_deit_adapter_base_512_160k_coco_stuff_bge_base.py 8 --eval mIoU

Citation

If this paper is useful to your work, please cite:

@article{xiao2023semantic,
  author    = {Xiao, Junfei and Zhou, Ziqi and Li, Wenxuan and Lan, Shiyi and Mei, Jieru and Yu, Zhiding and Yuille, Alan and Zhou, Yuyin and Xie, Cihang},
  title     = {A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties},
  journal   = {arXiv preprint arXiv:2312.13764},
  year      = {2023},
}

Acknowledgement

GPT-3.5 and Llama-2 are used for retrieving descriptive properties.

Sentence Transformer and BAAI-BGE are used as description embedding models.

MMSegmentation and ViT-Adapter are used as the segmentation codebase.

Many thanks to all these great projects .