Home

Awesome

Mapillary Vistas Dataset (MVD) Evaluation Scripts

This repository contains scripts to run evaluations on MVD predictions (e.g. on training and on validation data), and additionally form the basis of our evaluation server. Consequently, successfully running these scripts on machine-generated results helps to validate the predictor's output format required by the evaluation server.

Usage

The main entry point is mapillary_vistas/tools/evaluation.py.

The following command line options are mandatory:

Both flags -l and -p assume that all generated files are located within the same, specified folder location. The most important optional command line parameters are:

For full command line usage, please run python mapillary_vistas/tools/evaluation.py --help

Metrics

Semantic image segmentation

The main metric used for semantic image segmentation is the mean intersection over union (IoU) score [1], also known as averaged Jaccard index. IoU is defined as TP / (TP + FP + FN), where TP, FP and FN are true positives, false positives and false negatives, respectively.

For categories where we have instance-specific annotations available, we also provide instance-specific intersection over union (iIoU) [2] scores, applying re-scaling of TP and FN during IoU calculation. The scaling factor is the ratio of the current object size to the average object size of the respective class. Consequently, this metric reduces the bias towards object classes with dominant segmentation sizes. This metric is not to be confused with the metric used below for instance-specific segmenation assessment, as it does not require information about individual object instances.

Instance-specific semantic image segmentation

For instance-specific semantic image segmentation we calculate average precision (AP) scores, following [3]. We analyze each overlapping prediction per available ground truth instance. The prediction with the highest confidence for a given, minimal overlap is assigned to the ground truth instance. Unassigned ground truth instances count as false negatives (FN), unassigned predictions as false positives (FP), respectively. Based on these assignments we generate a precision-recall curve for each label. The average precision corresponds to the area under curve of the plot. Similar to [2,4], we calculate the main metric, AP, by using thresholds from 50% to 95% in 5% steps and finally average the individual APs.

License

By contributing to mapillary_vistas, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.

References

[1] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. In International Journal of Computer Vision (IJCV), 2010.

[2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[3] B. Hariharan, P. Arbeláez, R. B. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV), 2014.

[4] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), 2014.