Home

Awesome

Simple Concept DataBase

SCDB is a synthetic dataset developed for concept localization and inspired by the challenges of skin lesion classification using dermatoscopic images. It mimics the complex composition of diagnostic criteria in skin lesions e.g. spatial overlap, providing concept annotations and concept segmentation masks.

If you use this dataset, please consider citing our associated paper:

    @InProceedings{lucieri2020explaining,
    author="Lucieri, Adriano
    and Bajwa, Muhammad Naseer
    and Dengel, Andreas
    and Ahmed, Sheraz",
    title="Explaining AI-Based Decision Support Systems Using Concept Localization Maps",
    booktitle="Neural Information Processing",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="185--193",
    isbn="978-3-030-63820-7"
    }
<p align="center"> <img src="Fig/000044.png" width="200" /> <img src="Fig/000158.png" width="200" /> <img src="Fig/000233.png" width="200" /> <img src="Fig/000335.png" width="200" /> </p>

Dataset Description

Skin lesions are represented as big geometric base shapes filled with concepts, that are represented as smaller geometries that are randomly coloured, shaped and oriented. 10 shapes representing single concepts are used:

Concepts relevant to the target classifciation task occure only within the area of the base shape. 8 out of 10 concept classes are relevant for classifciation. Two concept classes (Cross, Line) are non-correlated to target classes. Target classes are indicated by following concept combinations:

Target ClassIndicative Concept Combinations
C1Hexagon&Star, <br>Ellipse&Star, <br>Triangle&Ellipse&Starmarker
C2Pentagon&Tripod, <br>Star&Tripod, <br>Rectangle&Star&Starmarker

Dataset Files

For each dataset split (train, val, test), label annotations (.csv) as well as concept annotations (.npy) are available. A separate concept split can be used for CAV training.

Label Files

The .csv files are provided in the form "filepath|label".

Concept Files

Concept annotations are provided in the form of binary, multilabel vectors of the size [Nx10], with N = number of samples.

Segmentation Files

Each split folder contains a Segmentation folder that contains a maximum of 10 concept-specific segmentation maps per sample. The concept's outline is segmented through a circle, covering the complete outline of the shape.

Dataset Distribution

SplitDatafileAnnotationsSamples
Traintrain.csvtrain.npy4800
Validationval.csvval.npy1200
Testtest.csvtest.npy1500
Conceptconcept.csvconcept.npy6000