Home

Awesome

<img src="CIoU.png" width="800px"/>

English | 简体中文

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation.

Our paper is accepted by IEEE Transactions on Cybernetics (TCYB).

This repo is based on YOLACT++.

This is the code for our papers:

@Inproceedings{zheng2020diou,
  author    = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei},
  title     = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression},
  booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
  pages     = {12993--13000},
  year      = {2020}
}

@Article{zheng2021ciou,
  author    = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng},
  title     = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation},
  journal   = {IEEE Transactions on cybernetics},
  volume    = {52},
  number    = {8},
  pages     = {8574--8586},
  year      = {2021},
  publisher = {IEEE}
}

Description of Cluster-NMS and Its Usage

An example diagram of our Cluster-NMS, where X denotes IoU matrix which is calculated by X=jaccard(boxes,boxes).triu_(diagonal=1) > nms_thresh after sorted by score descending. (Here use 0,1 for visualization.)

<img src="cluster-nms01.png" width="1150px"/> <img src="cluster-nms02.png" width="1150px"/>

The inputs of NMS are boxes with size [n,4] and scores with size [80,n]. (take coco as example)

There are two ways for NMS. One is that all classes have the same number of boxes. First, we use top k=200 to select the top 200 detections for every class. Then boxes will be [80,200,4]. Do Cluster-NMS and keep the boxes with scores>0.01. Finally, return top 100 boxes across all classes.

The other approach is that different classes have different numbers of boxes. First, we use a score threshold (e.g. 0.01) to filter out most low score detection boxes. It results in the number of remaining boxes in different classes may be different. Then put all the boxes together and sorted by score descending. (Note that the same box may appear more than once, because its scores of multiple classes are greater than the threshold 0.01.) Adding offset for all the boxes according to their class labels. (use torch.arange(0,80).) For example, since the coordinates (x1,y1,x2,y2) of all the boxes are on interval (0,1). By adding offset, if a box belongs to class 61, its coordinates will on interval (60,61). After that, the IoU of boxes belonging to different classes will be 0. (because they are treated as different clusters.) Do Cluster-NMS and return top 100 boxes across all classes. (For this method, please refer to another our repository https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/detection/detection.py)

Getting Started

1) New released! CIoU and Cluster-NMS

  1. YOLACT (See YOLACT)

  2. YOLOv3-pytorch https://github.com/Zzh-tju/ultralytics-YOLOv3-Cluster-NMS

  3. YOLOv5 (Support batch mode Cluster-NMS. It will speed up NMS when turning on test-time augmentation like multi-scale testing.) https://github.com/Zzh-tju/yolov5

  4. SSD-pytorch https://github.com/Zzh-tju/DIoU-SSD-pytorch

2) DIoU and CIoU losses into Detection Algorithms

DIoU and CIoU losses are incorporated into state-of-the-art detection algorithms, including YOLO v3, SSD and Faster R-CNN. The details of implementation and comparison can be respectively found in the following links.

  1. YOLO v3 https://github.com/Zzh-tju/DIoU-darknet

  2. SSD https://github.com/Zzh-tju/DIoU-SSD-pytorch

  3. Faster R-CNN https://github.com/Zzh-tju/DIoU-pytorch-detectron

  4. Simulation Experiment https://github.com/Zzh-tju/DIoU

YOLACT

Codes location and options

Please take a look at ciou function of layers/modules/multibox_loss.py for our CIoU loss implementation in PyTorch.

Currently, NMS surports two modes: (See eval.py)

  1. Cross-class mode, which ignores classes. (cross_class_nms=True, faster than per-class mode but with a slight performance drop.)

  2. Per-class mode. (cross_class_nms=False)

Currently, NMS supports fast_nms, cluster_nms, cluster_diounms, spm, spm_dist, spm_dist_weighted.

See layers/functions/detection.py for our Cluster-NMS implementation in PyTorch.

Installation

In order to use YOLACT++, make sure you compile the DCNv2 code.

Evaluation

Here are our YOLACT models (released on May 5th, 2020) along with their FPS on a GTX 1080 Ti and mAP on coco 2017 val:

The training is carried on two GTX 1080 Ti with command: python train.py --config=yolact_base_config --batch_size=8

Image SizeBackboneLossNMSFPSbox APmask APWeights
550Resnet101-FPNSL1Fast NMS30.631.529.1SL1.pth
550Resnet101-FPNCIoUFast NMS30.632.129.6CIoU.pth

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., yolact_base for yolact_base_54_800000.pth).

Quantitative Results on COCO

# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.

# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json

# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py

# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset

Qualitative Results on COCO

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display

Cluster-NMS Using Benchmark on COCO

python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark

Hardware

Image SizeBackboneLossNMSFPSbox APbox AP75box AR100mask APmask AP75mask AR100
550Resnet101-FPNCIoUFast NMS30.632.133.943.029.630.940.3
550Resnet101-FPNCIoUOriginal NMS11.532.534.145.129.731.041.7
550Resnet101-FPNCIoUCluster-NMS28.832.534.145.229.731.041.7
550Resnet101-FPNCIoUSPM Cluster-NMS28.633.135.248.830.331.743.6
550Resnet101-FPNCIoUSPM + Distance Cluster-NMS27.133.235.249.230.231.743.8
550Resnet101-FPNCIoUSPM + Distance + Weighted Cluster-NMS26.533.435.549.130.331.643.8

The following table is evaluated by using their pretrained weight of YOLACT. (yolact_resnet50_54_800000.pth)

Image SizeBackboneLossNMSFPSbox APbox AP75box AR100mask APmask AP75mask AR100
550Resnet50-FPNSL1Fast NMS41.630.231.942.028.029.139.4
550Resnet50-FPNSL1Original NMS12.830.732.044.128.129.240.7
550Resnet50-FPNSL1Cluster-NMS38.230.732.044.128.129.240.7
550Resnet50-FPNSL1SPM Cluster-NMS37.731.333.248.028.829.942.8
550Resnet50-FPNSL1SPM + Distance Cluster-NMS35.231.333.348.228.729.942.9
550Resnet50-FPNSL1SPM + Distance + Weighted Cluster-NMS34.231.833.948.328.829.943.0

The following table is evaluated by using their pretrained weight of YOLACT. (yolact_base_54_800000.pth)

Image SizeBackboneLossNMSFPSbox APbox AP75box AR100mask APmask AP75mask AR100
550Resnet101-FPNSL1Fast NMS30.632.534.643.929.831.340.8
550Resnet101-FPNSL1Original NMS11.932.934.845.829.931.442.1
550Resnet101-FPNSL1Cluster-NMS29.232.934.845.929.931.442.1
550Resnet101-FPNSL1SPM Cluster-NMS28.833.535.949.730.532.144.1
550Resnet101-FPNSL1SPM + Distance Cluster-NMS27.533.535.950.230.432.044.3
550Resnet101-FPNSL1SPM + Distance + Weighted Cluster-NMS26.734.036.649.930.532.044.3

The following table is evaluated by using their pretrained weight of YOLACT++. (yolact_plus_base_54_800000.pth)

Image SizeBackboneLossNMSFPSbox APbox AP75box AR100mask APmask AP75mask AR100
550Resnet101-FPNSL1Fast NMS25.135.838.745.534.436.842.6
550Resnet101-FPNSL1Original NMS10.936.439.148.034.737.144.1
550Resnet101-FPNSL1Cluster-NMS23.736.439.148.034.737.144.1
550Resnet101-FPNSL1SPM Cluster-NMS23.236.940.152.835.037.546.3
550Resnet101-FPNSL1SPM + Distance Cluster-NMS22.036.940.253.034.937.546.3
550Resnet101-FPNSL1SPM + Distance + Weighted Cluster-NMS21.737.440.652.535.037.646.3

Note:

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --ima


ge=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0

# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4

As you can tell, eval.py can do a ton of stuff. Run the --help command to see everything it can do.

python eval.py --help

Training

By default, we train on COCO. Make sure to download the entire dataset using the commands above.

# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config

# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5

# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Multi-GPU Support

YOLACT now supports multiple GPUs seamlessly during training:

Acknowledgments

Thank you to Daniel Bolya for his fork of YOLACT & YOLACT++, which is an exellent work for real-time instance segmentation.