Awesome
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
This repository contains the pytorch codes and trained models described in the CVPR2023 paper "". This algorithm is proposed by ByteDance, Intelligent Creation, AutoML Team (字节跳动-智能创作-AutoML团队).
Authors: Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang
Overview
Installation
Environment
- python>=3.7
- torch>=1.10.0
- torchvision>=0.11.1
- timm>=0.6.12
- detectron2>=0.6.0 follow Detectron2 installation instructions.
- requirement.txt
Other dependency
The modified clip package.
cd third_party/CLIP
python -m pip install -Ue .
CUDA kernel for MSDeformAttn
cd mask2former/modeling/heads/ops
bash make.sh
Dataset Preparation
We follow Mask2Former to build some datasets used in our experiments. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS
. Under this directory, detectron2 will look for datasets in the structure described below, if needed.
$DETECTRON2_DATASETS/
ADEChallengeData2016/
coco/
VOC2012/
You need to set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets
.
Expected dataset structure for COCO:
coco/
annotations/
instances_{train,val}2017.json
panoptic_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
panoptic_{train,val}2017/ # png annotations
stuffthingmaps/
Then transform the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
python datasets/prepare_coco_alldata.py datasets/coco
python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco
python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.json
python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.json
Expected dataset structure for VOC2012:
VOC2012/
JPEGImages/
SegmentationClassAug/
{train,val}.txt
Then transform the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
python datasets/prepare_voc_sem_seg.py datasets/VOC2012
python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train_base datasets/VOC2012/annotations_detectron2/train_base_label_count.json
python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json
Getting Started
Training
To train a model with "train_net.py", first make sure the preparations are done. Take the training on COCO as an example.
Training prompts
python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_learn_prompt_bs32_16k.yaml --num-gpus 8
Training model
python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_R101c_alltask_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPT_MODEL}
Evaluation
python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_R101c_alltask_bs32_60k.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS ${TRAINED_MODEL}
Testing for Demo
The model weight for demo can get from model.
Citation
If you find this work useful in your method, you can cite the paper as below:
@inproceedings{qin2023freeseg,
title={FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation},
author={Qin, Jie and Wu, Jie and Yan, Pengxiang and Li, Ming and Yuxi, Ren and Xiao, Xuefeng and Wang, Yitong and Wang, Rui and Wen, Shilei and Pan, Xin and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19446--19455},
year={2023}
}