Home

Awesome

Our paper's link is Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

<img width="600" alt="model" src="https://github.com/jiangsongtao/TinyMed/assets/43131870/956aa268-1c75-44b3-938f-40fbbd8a53b7">

Med-MoE is a novel and lightweight framework designed to handle both discriminative and generative multimodal medical tasks. It employs a three-step learning process: aligning multimodal medical images with LLM tokens, instruction tuning with a trainable router for expert selection, and domain-specific MoE tuning. Our model stands out by incorporating highly specialized domain-specific experts, significantly reducing the required model parameters by 30%-50% while achieving superior or on-par performance compared to state-of-the-art models. This expert specialization and efficiency make Med-MoE highly suitable for resource-constrained clinical settings.

<img width="800" alt="model" src="https://github.com/jiangsongtao/TinyMed/assets/43131870/21a9246d-698f-492f-ab6f-351cf97b055c">

Environment Setup

Prepare the Environment

  1. Clone and navigate to the TinyMed project directory:

    cd TinyMed
    
  2. Set up your environment:

    conda create -n tinymed python=3.10 -y
    conda activate tinymed
    pip install --upgrade pip
    pip install -e .
    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation
    
  3. Replace the default MoE with our provided version.

  4. Please download the domain-specific router provided by us or trained by yourself, and replace its path in the moellava/model/language_model/llava_stablelm_moe.py file.

Training

Prepare the Datasets

Utilize the LLaVA-Med Datasets for training:

Web Launch

Launch the Web Interface

Use DeepSpeed to start the Gradio web server:

CLI Inference

Command Line Inference Execute models from the command line:

Model Zoo

Available Models

Evaluation

The evaluation process involves running the model on multiple GPUs and combining the results. Below are the detailed steps and commands:

# Set the number of chunks and GPUs
CHUNKS=2
GPUS=(0 1)

# Run inference on each GPU
for IDX in {0..1}; do
    GPU_IDX=${GPUS[$IDX]}
    PORT=$((${GPUS[$IDX]} + 29500))
    MASTER_PORT_ENV="MASTER_PORT=$PORT"
    deepspeed --include localhost:$GPU_IDX --master_port $PORT model_vqa_med.py \
        --model-path your_model_path \
        --question-file ./test_rad.json \
        --image-folder ./3vqa/images \
        --answers-file ./test_llava-13b-chunk${CHUNKS}_${IDX}.jsonl \
        --temperature 0 \
        --num-chunks $CHUNKS \
        --chunk-idx $IDX \
        --conv-mode stablelm/phi2 &
done

# Combine JSONL results into one file
cat ./test_llava-13b-chunk2_{0..1}.jsonl > ./radvqa.jsonl

# Run evaluation
python run_eval.py \
    --gt ./3vqa/test_rad.json \
    --pred ./radvqa.jsonl \
    --output ./data_RAD/wrong_answers.json

Acknowledgements

Special thanks to these foundational works:

@misc{jiang2024medmoemixturedomainspecificexperts,
      title={Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models}, 
      author={Songtao Jiang and Tuo Zheng and Yan Zhang and Yeying Jin and Li Yuan and Zuozhu Liu},
      year={2024},
      eprint={2404.10237},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2404.10237}, 
}