Awesome
<div align="center"> <img src="assets/banner.gif"> <br> <br> Tianheng Cheng, <a href="https://xwcv.github.io/">Xinggang Wang</a><sup><span>†</span></sup>, Shaoyu Chen, Wenqiang Zhang, <a href="https://scholar.google.com/citations?user=pCY-bikAAAAJ&hl=zh-CN">Qian Zhang</a>, <a href="https://scholar.google.com/citations?user=IyyEKyIAAAAJ&hl=zh-CN">Chang Huang</a>, <a href="https://zhaoxiangzhang.net/">Zhaoxiang Zhang</a>, <a href="http://eic.hust.edu.cn/professor/liuwenyu/"> Wenyu Liu</a> </br> (<span>†</span>: corresponding author) <!-- <div><a href="">[Project Page]</a>(comming soon)</div> --> <div> <a href="https://arxiv.org/abs/2203.12827">[arXiv paper]</a> <a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Cheng_Sparse_Instance_Activation_for_Real-Time_Instance_Segmentation_CVPR_2022_paper.pdf">[CVPR paper]</a> <a href="https://drive.google.com/file/d/1xhqQvQ0YVCHd8XQxnCVqef75Hey7kI-d/view?usp=sharing">[slides]</a> </div> </div>Highlights
<div align="center"> <img src="assets/animate.gif"> <br> <br> <div> </div> </div>- SparseInst presents a new object representation method, i.e., Instance Activation Maps (IAM), to adaptively highlight informative regions of objects for recognition.
- SparseInst is a simple, efficient, and fully convolutional framework without non-maximum suppression (NMS) or sorting, and easy to deploy!
- SparseInst achieves good trade-off between speed and accuracy, e.g., 37.9 AP and 40 FPS with 608x input.
Updates
This project is under active development, please stay tuned!
☕
-
[2022-10-31]
: We release the models & weights for theCSP-DarkNet53
backbone. Which is a strong baseline with highly-competitve inference speed and accuracy. -
[2022-10-19]
: We provide the implementation and inference code based on MindSpore, a nice and efficient Deep Learning framework. Thanks Ruiqi Wang for this kind contribution! -
[2022-8-9]
: We provide the FLOPs counterget_flops.py
to obtain the FLOPs/Parameters of SparseInst. This update also includes some bugfixs. -
[2022-7-17]
:Faster
🚀: SparseInst now supports training and inference with FP16. Inference with FP16 improves the speed by 30%.Robust
: we replace theSigmoid + Norm
withSoftmax
for numerical stability, especially for ONNX.Easy-to-Use
: we provide the script for exporting SparseInst to ONNX models. -
[2022-4-29]
: We fix the common issue about the visualizationdemo.py
, e.g.,ValueError: GenericMask cannot handle ...
. -
[2022-4-7]
: We provide thedemo
code for visualization and inference on images. Besides, we have added more backbones for SparseInst, including ResNet-101, CSPDarkNet, and PvTv2. We are still supporting more backbones. -
[2022-3-25]
: We have released the code and models for SparseInst!
Overview
SparseInst is a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. In contrast to region boxes or anchors (centers), SparseInst adopts a sparse set of instance activation maps as object representation, to highlight informative regions for each foreground objects. Then it obtains the instance-level features by aggregating features according to the highlighted regions for recognition and segmentation. The bipartite matching compels the instance activation maps to predict objects in a one-to-one style, thus avoiding non-maximum suppression (NMS) in post-processing. Owing to the simple yet effective designs with instance activation maps, SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on COCO (NVIDIA 2080Ti), significantly outperforms the counter parts in terms of speed and accuracy.
<center> <img src="./assets/sparseinst.png"> </center>Models
We provide two versions of SparseInst, i.e., the basic IAM (3x3 convolution) and the Group IAM (G-IAM for short), with different backbones. All models are trained on MS-COCO train2017.
Fast models
model | backbone | input | aug | AP<sup>val</sup> | AP | FPS | weights |
---|---|---|---|---|---|---|---|
SparseInst | R-50 | 640 | ✘ | 32.8 | 33.2 | 44.3 | model |
SparseInst | R-50-vd | 640 | ✘ | 34.1 | 34.5 | 42.6 | model |
SparseInst (G-IAM) | R-50 | 608 | ✘ | 33.4 | 34.0 | 44.6 | model |
SparseInst (G-IAM, Softmax) | R-50 | 608 | ✘ | 33.6 | - | 44.6 | model |
SparseInst (G-IAM) | R-50 | 608 | ✓ | 34.2 | 34.7 | 44.6 | model |
SparseInst (G-IAM) | R-50-DCN | 608 | ✓ | 36.4 | 36.8 | 41.6 | model |
SparseInst (G-IAM) | R-50-vd | 608 | ✓ | 35.6 | 36.1 | 42.8 | model |
SparseInst (G-IAM) | R-50-vd-DCN | 608 | ✓ | 37.4 | 37.9 | 40.0 | model |
SparseInst (G-IAM) | R-50-vd-DCN | 640 | ✓ | 37.7 | 38.1 | 39.3 | model |
SparseInst with other backbones
model | backbone | input | AP<sup>val</sup> | AP | FPS | weights |
---|---|---|---|---|---|---|
SparseInst (G-IAM) | CSPDarkNet | 640 | 35.1 | - | - | model |
Larger models
model | backbone | input | aug | AP<sup>val</sup> | AP | FPS | weights |
---|---|---|---|---|---|---|---|
SparseInst (G-IAM) | R-101 | 640 | ✘ | 34.9 | 35.5 | - | model |
SparseInst (G-IAM) | R-101-DCN | 640 | ✘ | 36.4 | 36.9 | - | model |
SparseInst with Vision Transformers
model | backbone | input | aug | AP<sup>val</sup> | AP | FPS | weights |
---|---|---|---|---|---|---|---|
SparseInst (G-IAM) | PVTv2-B1 | 640 | ✘ | 35.3 | 36.0 | 33.5 (48.9<sup>↡</sup>) | model |
SparseInst (G-IAM) | PVTv2-B2-li | 640 | ✘ | 37.2 | 38.2 | 26.5 | model |
<sup>↡</sup>: measured on RTX 3090.
Note:
- We will continue adding more models including more efficient convolutional networks, vision transformers, and larger models for high performance and high speed, please stay tuned 😁!
- Inference speeds are measured on one NVIDIA 2080Ti unless specified.
- We haven't adopt TensorRT or other tools to accelerate the inference of SparseInst. However, we are working on it now and will provide support for ONNX, TensorRT, MindSpore, Blade, and other frameworks as soon as possible!
- AP denotes AP evaluated on MS-COCO test-dev2017
- input denotes the shorter side of the input, e.g., 512x864 and 608x864, we keep the aspect ratio of the input and the longer side is no more than 864.
- The inference speed might slightly change on different machines (2080 Ti) and different versions of detectron (we mainly use v0.3). If the change is sharp, e.g., > 5ms, please feel free to contact us.
- For
aug
(augmentation), we only adopt the simple random crop (crop size: [384, 600]) provided by detectron2. - We adopt
weight decay=5e-2
as default setting, which is slightly different from the original paper. - [Weights on BaiduPan]: we also provide trained models on BaiduPan: ShareLink (password: lkdo).
Installation and Prerequisites
This project is built upon the excellent framework detectron2, and you should install detectron2 first, please check official installation guide for more details.
Updates: SparseInst works well on detectron2-v0.6.
Note: previously, we mainly use v0.3 of detectron2 for experiments and evaluations. Besides, we also test our code on the newest version v0.6. If you find some bugs or incompatibility problems of higher version of detectron2, please feel free to raise a issue!
Install the detectron2:
git clone https://github.com/facebookresearch/detectron2.git
# if you swith to a specific version, e.g., v0.3 (recommended) or v0.6
git checkout tags/v0.6
# build detectron2
python setup.py build develop
Getting Start
🔥 SparseInst with FP16
SparseInst with FP16 achieves 30% faster inference speed and saves much training memory, we provide some comparisons about the memory, inference speed, and training speed in the below table.
FP16 | train mem.(log) | train mem.(nvidia-smi ) | train speed | infer. speed |
---|---|---|---|---|
✘ | 6.0G | 10.5G | 0.8690s/iter | 52.17 FPS |
✓ | 3.9G | 6.8G | 0.6949s/iter | 67.57 FPS |
Note: statistics are measured on NVIDIA 3090. With FP16, we have faster training speed and can also increase the batch size for better performance.
- Training with FP16: enable FP16 is simple, you only need to enable
SOLVER.AMP.ENABLED=True
, or add this configuration to the config file.
python tools/train_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --num-gpus 8 SOLVER.AMP.ENABLED True
- Testing with FP16: enable FP16 for inference by adding
--fp16
.
python tools/test_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --fp16 MODEL.WEIGHTS model_final.pth
Testing SparseInst
Before testing, you should specify the config file <CONFIG>
and the model weights <MODEL-PATH>
. In addition, you can change the input size by setting the INPUT.MIN_SIZE_TEST
in both config file or commandline.
- [Performance Evaluation] To obtain the evaluation results, e.g., mask AP on COCO, you can run:
python tools/train_net.py --config-file <CONFIG> --num-gpus <GPUS> --eval MODEL.WEIGHTS <MODEL-PATH>
# example:
python tools/train_net.py --config-file configs/sparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth
- [Inference Speed] To obtain the inference speed (FPS) on one GPU device, you can run:
python tools/test_net.py --config-file <CONFIG> MODEL.WEIGHTS <MODEL-PATH> INPUT.MIN_SIZE_TEST 512
# example:
python tools/test_net.py --config-file configs/sparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512
Note:
- The
tools/test_net.py
only supports 1 GPU and 1 image per batch for measuring inference speed. - The inference time consists of the pure forward time and the post-processing time. While the evaluation processing, data loading, and pre-processing for wrappers (e.g., ImageList) are not included.
COCOMaskEvaluator
is modified fromCOCOEvaluator
for evaluating mask-only results.
FLOPs and Parameters
The get_flops.py
is built based on detectron2
and fvcore
.
python tools/get_flops.py --config-file <CONFIG> --tasks parameter flop
Visualizing Images with SparseInst
To inference or visualize the segmentation results on your images, you can run:
python demo.py --config-file <CONFIG> --input <IMAGE-PATH> --output results --opts MODEL.WEIGHTS <MODEL-PATH>
# example
python demo.py --config-file configs/sparse_inst_r50_giam.yaml --input datasets/coco/val2017/* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512
- Besides, the
demo.py
also supports inference on video (--video-input
), camera (--webcam
). For inference on video, you might refer to issue #9 to avoid someerrors. --opts
supports modifications to the config-file, e.g.,INPUT.MIN_SIZE_TEST 512
.--input
can be single image or a folder of images, e.g.,xxx/*
.- If
--output
is not specified, a popup window will show the visualization results for each image. - Lowering the
confidence-threshold
will show more instances but with more false positives.
Training SparseInst
To train the SparseInst model on COCO dataset with 8 GPUs. 8 GPUs are required for the training. If you only have 4 GPUs or GPU memory is limited, it doesn't matter and you can reduce the batch size through SOLVER.IMS_PER_BATCH
or reduce the input size. If you adjust the batch size, learning schedule should be adjusted according to the linear scaling rule.
python tools/train_net.py --config-file <CONFIG> --num-gpus 8
# example
python tools/train_net.py --config-file configs/sparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8
<!-- ### ONNX Export -->
Custom Training of SparseInst
- We suggest you convert your custom datasets into the
COCO
format, which enables the usage of the default dataset mappers and loaders. You may find more details in the official guide of detectron2. - You need to check whether
NUM_CLASSES
andNUM_MASKS
should be changed according to your scenarios or tasks. - Change the configurations accordingly.
- After finishing the above procedures, you can easily train SparseInst by
train_net.py
.
Acknowledgements
SparseInst is based on detectron2, OneNet, DETR, and timm, and we sincerely thanks for their code and contribution to the community!
Citing SparseInst
If you find SparseInst is useful in your research or applications, please consider giving us a star 🌟 and citing SparseInst by the following BibTeX entry.
@inproceedings{Cheng2022SparseInst,
title = {Sparse Instance Activation for Real-Time Instance Segmentation},
author = {Cheng, Tianheng and Wang, Xinggang and Chen, Shaoyu and Zhang, Wenqiang and Zhang, Qian and Huang, Chang and Zhang, Zhaoxiang and Liu, Wenyu},
booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
year = {2022}
}
License
SparseInst is released under the MIT Licence.