Awesome

<div align="center"> <h1>Large-Scale 3D Medical Image Pre-training</h1>

</div>

This work presents VoCo, a new method for Large-Scale 3D Medical Image Pre-training. We release a new benchmark, including 160K volumes (42M slices) for pre-training, 31M~1.2B params of pre-trained models, various pre-training recipes, and 50+ downstream tasks implementation.

Linshan Wu, Jiaxin Zhuang, and <a href="https://scholar.google.com/citations?hl=en&user=Z_t5DjwAAAAJ">Hao Chen</a>. "Large-Scale 3D Medical Image Pre-training with Geometric Context Priors". CVPR 2024 Extension.

teaser

Quick Start

Models: 31M~1.2B params of pre-trained models.
Downstream: 50+ tasks implementations (segmentation, classification, registration, vision-language).
Datasets:
- PreCT-160K: The existing largest dataset in this field: 160K CT volumes (42M slices)
- VoComni: 20K volumes with pseudo labels (20 organ & tumor classes)
- VoCovid: Semi-supervised covid segmentation
Pre-training:
- Fully-supervised: Pre-training with labeled data
- Self-supervised: Pre-training with unlabeled data
- Semi-supervised: Pre-training with labeled and unlabeled data
- Omni-supervised: Pre-training with labeled and unlabeled data
CVPR version
中文解读
公众号

Pre-trained Models

We provide various models for downstream tasks. For nnUNet, please refer to nnunet trainer.

'SSL_head' represents trained by Self-supervised pre-training.
'Omni' represents trained by Omni-supervised pre-training.

Model	Params	Checkpoint
VoComni_nnunet	31M	Download
VoCo_B_SSL_head	53M	Download
VoCo_L_SSL_head	206M	Download
VoCo_H_SSL_head	818M	Download
VoComni_B	72M	Download
VoComni_L	290M	Download
VoComni_H	1.2B	Download

We download checkpoints of previous methods from SuPreM for comparison (Thanks for their great efforts!).

Summary: We spent over 10,000 GPU hours in evaluating 50+ downstream tasks. SuPreM appears to be the best in previous methods. You can try these models in Downstream.

The path of pre-trained models should be organized as:

├── YOUR/DIRECTORY/OF/PRETRAINED/MODELS
    ├── VoComni_nnunet.pt
    ├── VoCo_B_SSL_head.pt
    ├── VoCo_L_SSL_head.pt
    ├── VoCo_H_SSL_head.pt
    ├── VoComni_B.pt
    ├── VoComni_L.pt
    ├── VoComni_H.pt
    ├── supervised_dodnet_unet_920.pth
    ├── supervised_clip_driven_universal_swin_unetr_2100.pth
    ├── self_supervised_unimiss_nnunet_small_5022.pth
    ├── self_supervised_nv_swin_unetr_5050.pt
    ├── self_supervised_models_genesis_unet_620.pt
    └── supervised_suprem_swinunetr_2100.pth

Load Pre-trained models

import torch
import argparse
from monai.networks.nets import SwinUNETR
def load(model, model_dict):
    # make sure you load our checkpoints
    if "state_dict" in model_dict.keys():
        state_dict = model_dict["state_dict"]
    else:
        state_dict = model_dict
    current_model_dict = model.state_dict()
    for k in current_model_dict.keys():
        if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()):
            print(k)
    new_state_dict = {
        k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
        for k in current_model_dict.keys()}
    model.load_state_dict(new_state_dict, strict=True)
    return model
parser = argparse.ArgumentParser(description="VoCo models")
parser.add_argument("--feature_size", default=48, type=int,
                    help="feature size: 48 Base (B), 96 Large (L), 192 Huge (H)")
parser.add_argument("--in_channels", default=1, type=int, help="number of input channels")
parser.add_argument("--out_channels", default=21, type=int, help="number of output channels")
parser.add_argument("--roi_x", default=96, type=int, help="roi size in x direction")
parser.add_argument("--roi_y", default=96, type=int, help="roi size in y direction")
parser.add_argument("--roi_z", default=96, type=int, help="roi size in z direction")
args = parser.parse_args()
model = SwinUNETR(img_size=(args.roi_x, args.roi_y, args.roi_z),
        in_channels=args.in_channels,
        out_channels=args.out_channels,
        feature_size=args.feature_size,
        use_v2=True)
# YOUR PATH OF PRETRAINED MODELS. MODIFY IT
pretrained_path = './pretrained/VoComni_B.pt'
model_dict = torch.load(pretrained_path, map_location=torch.device('cpu'))
model = load(model, model_dict)

NOTE: "roi" is flexible according to your own settings. Your need to adjust "in_channels" and "out_channels" for specific datasets. If "in_channels != 1" or "out_channels != 21", only the first layer or the last layer would not be loaded.

Fine-tuning

Installation

git clone https://github.com/Luffy03/Large-Scale-Medical
cd Large-Scale-Medical
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Download Downstream Datasets

Please refer to Acknowledgment. Download our pre-processed downstream datasets for downstream tasks.

Implementations

Please refer to Downstream: 50+ downstream tasks implementations.

We are uploading our fine-tuning checkpoints to BaiduYun to make sure fair comparisons.

Pre-training <a name="Pre-training"></a>

Download Pre-training Dataset

Please refer to Acknowledgment. Download our PreCT-160K for pre-training.

WARNING:

It requires 22.6 TB space to store the original datasets. For pre-training, it requires extra 30 TB space to cache the data, otherwise the pre-training will be very slow. And please store them in SSD.
If you do not have enough space for PreCT-160K, you can try our VoComni dataset. It requires less than 10 TB only.

Various Pre-training recipes

Please refer to:

VoComni

To facilitate the following research, we use VoCo to generate pseudo labels on 20K volumes, with 20 organ and tumor classes. Please refer to VoComni.

VoCovid

Please refer to VoCovid for Semi-supervised Covid Segmentation. Dataset can be downloaded from hugging face.

Acknowledgement <a name="Acknowledgment"></a>

NOTE THAT we are not the authors of these datasets. Although all these datasets are publicly available for academic research, you need to cite the original works as shown in our paper. For certain datasets (e.g., WORD) that necessitate approval from the authors, you need to download it from the original link.

Citation

If you find this repo useful for your research, please consider citing the paper as follows:

@article{wu2024large,
  title={Large-Scale 3D Medical Image Pre-training with Geometric Context Priors},
  author={Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
  journal={arXiv preprint arXiv:2410.09890},
  year={2024}
}
@InProceedings{voco-v1,
    author    = {Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
    title     = {VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis},
    booktitle = {CVPR},
    month     = {June},
    year      = {2024},
    pages     = {22873-22882}
}