Home

Awesome

SIIM-ACR Pneumothorax Segmentation

If you use this code in research, please cite the following paper:

@misc{Aimoldin2019,
  author = {Aimoldin Anuar},
  title = {{SIIM–ACR} {P}neumothorax {S}egmentation},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/sneddy/pneumothorax-segmentation}},
}

First place solution

Video with short explanation: https://youtu.be/Wuf0wE3Mrxg

Presentation with short explanation: https://yadi.sk/i/oDYnpvMhqi8a7w

Competition: https://kaggle.com/c/siim-acr-pneumothorax-segmentation

Model Zoo

Main Features

Triplet scheme of inference and validation

Let our segmentation model output some mask with probabilities of pneumothorax pixels. I'm going to name this mask as a basic sigmoid mask. I used triplet of different thresholds: (top_score_threshold, min_contour_area, bottom_score_threshold)

The decision rule is based on a doublet (top_score_threshold, min_contour_area). I used it instead of using the classification of pneumothorax/non-pneumothorax.

Those images that didn't pass this doublet of thresholds were counted non-pneumothorax images.

For the remaining pneumothorax images, we binarize basic sigmoid mask using bottom_score_threshold (another binariztion threshold, less then top_score_threshold). You may notice that most participants used the same scheme under the assumption that bottom_score_threshold = top_score_threshold.

The simplified version of this scheme:

classification_mask = predicted > top_score_threshold
mask = predicted.copy()
mask[classification_mask.sum(axis=(1,2,3)) < min_contour_area, :,:,:] = np.zeros_like(predicted[0])
mask = mask > bot_score_threshold
return mask

Search best triplet thresholds during validation

For my final submissions I chose something between these triplets.

Combo loss

Used [combo loss] combinations of BCE, dice and focal. In the best experiments the weights of (BCE, dice, focal), that I used were:

Why exactly these weights?

In the beginning, I trained using only 1-1-1 scheme and this way I get my best public score.

I noticed that in older epochs, Dice loss is higher than the rest about 10 times.

For balancing them I decide to use a 3-1-4 scheme and it got me the best validation score.

As a compromise I chose 2-1-2 scheme for resnet50)

Sliding sample rate

Let's name portion of pneumothorax images as sample rate.

The main idea is control this portion using sampler of torch dataset.

On each epoch, my sampler gets all images from a dataset with pneumothorax and sample some from non-pneumothorax according to this sample rate. During train process, we reduce this parameter from 0.8 on start to 0.4 in the end.

Large sample rate at the beginning provides a quick start of the learning process, whereas a small sample rate at the end provides better convergence of neural network weights to the initial distribution of pneumothorax/non-pneumothorax images.

Learning Process recipes

I can't provide a fully reproducible solution because during learning process I was uptrain my models A LOT. But looking back for the formalization of my experiments I can highlight 4 different parts:

All these parts are presented in the corresponding experiment folder

Augmentations

Used following transforms from [albumentations]

albu.Compose([
    albu.HorizontalFlip(),
    albu.OneOf([
        albu.RandomContrast(),
        albu.RandomGamma(),
        albu.RandomBrightness(),
        ], p=0.3),
    albu.OneOf([
        albu.ElasticTransform(alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03),
        albu.GridDistortion(),
        albu.OpticalDistortion(distort_limit=2, shift_limit=0.5),
        ], p=0.3),
    albu.ShiftScaleRotate(),
    albu.Resize(img_size,img_size,always_apply=True),
])

Uptrain from lower resolution

All experiments (except resnet50) uptrained on size 1024x1024 after 512x512 with frozen encoder on early epoches.

Second stage uptrain

All choosen experiments was uptrained on second stage data

Checkpoints averaging

Top3 checkpoints averaging from each fold from each pipeline on inference

Small batchsize without accumulation

A batch size of 2-4 pictures is enough and all my experiments were run on one (sometimes two) 1080-Ti.

Horizontal flip TTA

File structure

├── unet_pipeline
│   ├── experiments
│   │   ├── some_experiment
│   │   │   ├── train_config.yaml
│   │   │   ├── inference_config.yaml
│   │   │   ├── submit_config.yaml
│   │   │   ├── checkpoints
│   │   │   │   ├── fold_i
│   │   │   │   │   ├──topk_checkpoint_from_fold_i_epoch_k.pth 
│   │   │   │   │   ├──summary.csv
│   │   │   │   ├──best_checkpoint_from_fold_i.pth
│   │   │   ├── log
├── input                
│   ├── dicom_train
│   │   ├── some_folder
│   │   │   ├── some_folder
│   │   │   │   ├── some_train_file.dcm
│   ├── dicom_test   
│   │   ├── some_folder
│   │   │   ├── some_folder
│   │   │   │   ├── some_test_file.dcm
|   ├── new_sample_submission.csv
│   └── new_train_rle.csv
└── requirements.txt

Install

pip install -r requirements.txt

Data Preparation

You need to paste your own names of input data folders and rle_fole

cd unet_pipeline/utils
python prepare_png.py -img_size 1024 -train_path ../../input/dicom_train test_path ../../input/dicom_test -out_path ../../input/dataset1024 -rle_path ../../input/new_train_rle.csv -n_threads 8

Pipeline launch example

Training:

cd unet_pipeline
python Train.py experiments/albunet_valid/train_config_part0.yaml
python Train.py experiments/albunet_valid/train_config_part1.yaml
python Train.py experiments/albunet_valid/train_config_part2.yaml
python Train.py experiments/albunet_valid/train_config_2nd_stage.yaml

As an output, we get a checkpoints in corresponding folder.

Inference:

cd unet_pipeline
python Inference.py experiments/albunet_valid/2nd_stage_inference.yaml

As an output, we get a pickle-file with mapping the file name into a mask with pneumothorax probabilities.

Submit:

cd unet_pipeline
python TripletSubmit.py experiments/albunet_valid/2nd_stage_submit.yaml

As an output, we get submission file with rle.

Best experiments:

picture alt

Final Submission

My best model for Public Leaderboard was albunet_public (PL: 0.8871), and score of all ensembling models was worse. But I suspected overfitting for this model therefore both final submissions were ensembles.

Private Leaderboard:

I suspect that the best solution would be ensemble believed in the validation scores more, but used more "weak" triplet thresholds.