Home

Awesome

<p align="center"> <img src="https://raw.githubusercontent.com/Project-MONAI/MONAI/dev/docs/images/MONAI-logo-color.png" width="30%"/> </p>

MONAI Vision Language Models

The repository provides a collection of vision language models, benchmarks, and related applications, released as part of Project MONAI (Medical Open Network for Artificial Intelligence).

💡 News

VILA-M3

VILA-M3 is a vision language model designed specifically for medical applications. It focuses on addressing the unique challenges faced by general-purpose vision-language models when applied to the medical domain and integrated with existing expert segmentation and classification models.

<p align="center"> <img src="m3/docs/images/VILA-M3_overview_v2.png" width="95%"/> </p>

For details, see here.

Online Demo

Please visit the VILA-M3 Demo to try out a preview version of the model.

<p align="center"> <img src="m3/docs/images/gradio_app_ct.png" width="70%"/> </p>

Local Demo

Prerequisites

Recommended: Build Docker Container

  1. To run the demo, we recommend building a Docker container with all the requirements. We use a base image with cuda preinstalled.
    docker build --network=host --progress=plain -t monai-m3:latest -f m3/demo/Dockerfile .
    
  2. Run the container
    docker run -it --rm --ipc host --gpus all --net host monai-m3:latest bash
    

    Note: If you want to load your own VILA checkpoint in the demo, you need to mount a folder using -v <your_ckpts_dir>:/data/checkpoints in your docker run command.

  3. Next, follow the steps to start the Gradio Demo.

Alternative: Manual installation

  1. Linux Operating System

  2. CUDA Toolkit 12.2 (with nvcc) for VILA.

    To verify CUDA installation, run:

    nvcc --version
    

    If CUDA is not installed, use one of the following methods:

    • Recommended Use the Docker image: nvidia/cuda:12.2.2-devel-ubuntu22.04
      docker run -it --rm --ipc host --gpus all --net host nvidia/cuda:12.2.2-devel-ubuntu22.04 bash
      
    • Manual Installation (not recommended) Download the appropiate package from NVIDIA offical page
  3. Python 3.10 Git Wget and Unzip:

    To install these, run

    sudo apt-get update
    sudo apt-get install -y wget python3.10 python3.10-venv python3.10-dev git unzip
    

    NOTE: The commands are tailored for the Docker image nvidia/cuda:12.2.2-devel-ubuntu22.04. If using a different setup, adjust the commands accordingly.

  4. GPU Memory: Ensure that the GPU has sufficient memory to run the models:

    • VILA-M3: 8B: ~18GB, 13B: ~30GB
    • CXR: This expert dynamically loads various TorchXRayVision models and performs ensemble predictions. The memory requirement is roughly 1.5GB in total.
    • VISTA3D: This expert model dynamically loads the VISTA3D model to segment a 3D-CT volume. The memory requirement is roughly 12GB, and peak memory usage can be higher, depending on the input size of the 3D volume.
    • BRATS: (TBD)
  5. Setup Environment: Clone the repository, set up the environment, and download the experts' checkpoints:

    git clone https://github.com/Project-MONAI/VLM --recursive
    cd VLM
    python3.10 -m venv .venv
    source .venv/bin/activate
    make demo_m3
    

Running the Gradio Demo

  1. Navigate to the demo directory:

    cd m3/demo
    
  2. Start the Gradio demo:

    This will automatically download the default VILA-M3 checkpoint from Hugging Face.

    python gradio_m3.py
    
  3. Alternative: Start the Gradio demo with a local checkpoint, e.g.:

    python gradio_m3.py  \
    --source local \
    --modelpath /data/checkpoints/<8B-checkpoint-name> \
    --convmode llama_3
    

For details, see the available commmandline arguments.

Adding your own expert model

Contributing

To lint the code, please install these packages:

pip install -r requirements-ci.txt

Then run the following command:

isort --check-only --diff .  # using the configuration in pyproject.toml
black . --check  # using the configuration in pyproject.toml
ruff check .  # using the configuration in ruff.toml

To auto-format the code, run the following command:

isort . && black . && ruff format .

References & Citation

If you find this work useful in your research, please consider citing:

@article{nath2024vila,
  title={VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge},
  author={Nath, Vishwesh and Li, Wenqi and Yang, Dong and Myronenko, Andriy and Zheng, Mingxin and Lu, Yao and Liu, Zhijian and Yin, Hongxu and Law, Yee Man and Tang, Yucheng and others},
  journal={arXiv preprint arXiv:2411.12915},
  year={2024}
}