Home

Awesome

LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3

<p align="center"> <img src="https://i.imgur.com/waxVImv.png" alt="Oryx Models"> </p>

Hanoona Rasheed*, Muhammad Maaz*, Salman Khan, and Fahad Khan

* Equal contributions

Mohamed bin Zayed University of AI (MBZUAI)

Google Demo Demo Demo


📢 Latest Updates


<p align="center"> <img src="images/logos/face.png" width="300"> </p>

💬 Introduction

This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weak🔥, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.

🏆 Results: Phi-3-V and LLaVA-3-V

<p align="center"> <img src="images/lava++_radar_plot.png" width="500"> </p>

Comparison on Benchmarks for Instruction-following LMMS & academic-task-oriented datasets:

<p align="center"> <img src="images/LLaVA-pp-results.png"> </p>

🤖 Model-Zoo

The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.

Model NameHugging Face LinkSummary
LLaVA-Phi-3-mini-4k-instruct-pretrainHugging FacePretrained on LCS-558K.
LLaVA-Phi-3-mini-4k-instruct-loraHugging FaceLoRA weights fine-tuned on LLaVA-Instruct-665K.
LLaVA-Phi-3-mini-4k-instructHugging FaceMerged LoRA weights in HuggingFace format.
LLaVA-Phi-3-mini-4k-instruct-FTHugging FaceFully fine-tuned model weights in HuggingFace format.
Model NameHugging Face LinkSummary
LLaVA-Meta-Llama-3-8B-Instruct-pretrainHugging FacePretrained on LCS-558K.
LLaVA-Meta-Llama-3-8B-Instruct-loraHugging FaceLoRA weights fine-tuned on LLaVA-Instruct-665K.
LLaVA-Meta-Llama-3-8B-InstructHugging FaceMerged weights in HuggingFace format.
LLaVA-Meta-Llama-3-8B-Instruct-FTHugging FaceFully fine-tuned model weights in HuggingFace format.
LLaVA-Meta-Llama-3-8B-Instruct-FT-S2Hugging FaceFully fine-tuned S2 model weights in HuggingFace format.

Installation

git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

Packages you need to update from LLAVA:

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

🚀 Phi-3-V

To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py

# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh

Train Phi-3-V

  1. Pre-train
cd LLaVA
bash Phi3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash Phi3-V_finetune_lora.sh

🚀 LLaMA-3-V

To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py

# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh

Train LLaMA-3-V

  1. Pre-train
cd LLaVA
bash LLaMA3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash LLaMA3-V_finetune_lora.sh

🙏 Acknowledgement

We are thankful to LLaVA, lmms-eval and S<sup>2</sup>-Wrapper for releasing their models and code as open-source contributions.

In case if you face any issues or have any questions, please feel free to create an issue or reach out at hanoona.bangalath@mbzuai.ac.ae & muhammad.maaz@mbzuai.ac.ae.

📜 Citation

  @misc{hanoona2024LLaVA++,
          title={LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3},
          author={Rasheed, Hanoona and Maaz, Muhammad and Khan, Salman and Khan, Fahad S.},
          url={https://github.com/mbzuai-oryx/LLaVA-pp},
          year={2024}
  }

<img src="images/logos/IVAL_logo.png" width="200" height="100"> <img src="images/logos/Oryx_logo.png" width="100" height="100"> <img src="images/logos/MBZUAI_logo.png" width="360" height="85">