Home

Awesome

The BioMassters

This code is one of three benchmarks for dataset BioMassters.

Competition Page and Leaderboard and Paper

Team: Just4Fun

Contact: quqixun@gmail.com

Source Code: https://github.com/quqixun/BioMassters

1. Method

2. Environment

# create environment
conda create --name biomassters python=3.9
conda activate biomassters

# install dependencies
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Clone code:

git clone git@github.com:quqixun/BioMassters.git
# working dir
cd BioMassters

3. Dataset Preparation

./data/information
├── biomassters-download-instructions.txt  # Instructions to download satellite images and AGBM data
├── features_metadata_FzP19JI.csv          # Metadata for satellite images
└── train_agbm_metadata.csv                # Metadata for training set AGBM tifs
s3_node=as  # options: as, us, eu
split=all   # download specific dataset, options: train, test, all
            # set split to test for predicting only
            # set split to train for training only
            # set split to all otherwise
download_root=./data/source
features_metadata=./data/information/features_metadata_FzP19JI.csv
training_labels_metadata=./data/information/train_agbm_metadata.csv

python download.py \
    --download_root            $download_root            \
    --features_metadata        $features_metadata        \
    --training_labels_metadata $training_labels_metadata \
    --s3_node                  $s3_node                  \
    --split                    $split

Data will be saved in ./data/source as following arrangement. Or you can reorganize the exist dataset in the same structure.

./data/source
├── test
│   ├── aa5e092e
│   │   ├── S1
│   │   │   ├── aa5e092e_S1_00.tif
│   │   │   ├── ...
│   │   │   └── aa5e092e_S1_11.tif
│   │   └── S2
│   │       ├── aa5e092e_S2_00.tif
│   │       ├── ...
│   │       └── aa5e092e_S2_11.tif
|   ├── ...
│   └── fff812c0
└── train
    ├── aa018d7b
    |   ├── S1
    |   |   └── ...
    |   ├── S2
    |   |   └── ...
    |   └── aa018d7b_agbm.tif
    ├── ...
    └── fff05995
source_root=./data/source
split_seed=42
split_folds=5

python process.py \
    --source_root    $source_root \
    --process_method plain

python split.py \
    --data_root   $source_root \
    --split_seed  $split_seed  \
    --split_folds $split_folds

Outputs in ./data/source should be same as the following structure:

./data/source
├── plot              # plot of data distribution
├── splits.pkl        # 5 folds for cross validation
├── stats_log2.pkl    # statistics of log2 transformed dataset
├── stats_plain.pkl   # statistics of original dataset
├── test
└── train

This step takes about 80Gb RAM. You don't have to run the above script again since all outputs can be found in ./data/source.

4. Training

Train model with arguments (see ./scripts/train.sh):

device=0
process=plain
folds=0,1,2,3,4
data_root=./data/source
config_file=./configs/swin_unetr/exp1.yaml

CUDA_VISIBLE_DEVICES=$device \
python train.py              \
    --data_root      $data_root             \
    --exp_root       ./experiments/$process \
    --config_file    $config_file           \
    --process_method $process               \
    --folds          $folds

Run ./scripts/tran.sh for training, then models and logs will be saved in ./experiments/plain/swin_unetr/exp1.

Training on 5 folds will take about 1 week if only one GPU is available. If you have 5 GPUs, you can run each fold training on each GPU, and it will take less than 2 days. You can download the trained models from BaiduDisc (code:jarp), MEGA or Google Drive, and then unzip models as following arrangement:

./experiments/plain/swin_unetr/exp1
├── fold0
│   ├── logs.csv
│   └── model.pth
├── fold1
│   ├── logs.csv
│   └── model.pth
├── fold2
│   ├── logs.csv
│   └── model.pth
├── fold3
│   ├── logs.csv
│   └── model.pth
└── fold4
    ├── logs.csv
    └── model.pth

5. Predicting

Make predictions with almost the same arguments as training (see ./scripts/predict.sh):

device=0
process=plain
folds=0,1,2,3,4
apply_tta=false
data_root=./data/source
config_file=./configs/swin_unetr/exp1.yaml

CUDA_VISIBLE_DEVICES=$device \
python predict.py            \
    --data_root      $data_root             \
    --exp_root       ./experiments/$process \
    --output_root    ./predictions/$process \
    --config_file    $config_file           \
    --process_method $process               \
    --folds          $folds                 \
    --apply_tta      $apply_tta

Run ./scripts/predict.sh for predicting, then predictions will be saved in ./predictions/plain/swin_unetr/exp1/folds_0-1-2-3-4.

Predicting public testing samples on 5 folds and calculating the average will take about 30 minutes. You can download the submission for public testing dataset from BaiduDisc (code:w61j) or MEGA.

6. Metrics

Metrics of submitted models and predictions on validation dataset and testing dataset.

MetricsVal<br/>Fold 0Val<br/>Fold 1Val<br/>Fold 2Val<br/>Fold 3Val<br/>Fold 4Val<br/>AverageTest<br/>PublicTest<br/>Private
L<sub>rec</sub>0.035620.035160.035270.035220.03626--
L<sub>ssim</sub>0.047580.046840.047130.046910.04834--
RMSE27.967627.436827.501127.895428.094627.778127.389127.6779

7. Reference

8. License