Awesome

<h2 align="center"> Llama3-Med </a><h5 align="center">

Installation
Get Started
Model Zoo
Launch Demo Locally
Custom Finetune
Customize Your Own Large Multimodel Models

Installation and Requirements

Please note that our environment requirements are different from LLaVA's environment requirements. We strongly recommend you create the environment from scratch as follows.

Clone this repository and navigate to the folder

git clone https://github.com/standardmodelbio/llama3-med.git
cd llama3-med

Create a conda environment, activate it and install Packages

conda create -n <env-name> python=3.10 -y
conda activate <env-name>
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages

pip install -e ".[train]"

pip install flash-attn --no-build-isolation

Upgrade to the latest code base

git pull
pip install -e .

Get Started

1. Data Preparation

Please refer to the Data Preparation section in our Documenation.

2. Train

Here's an example for training a LMM using Phi-2.

Replace data paths with yours in scripts/train/train_phi.sh
Replace output_dir with yours in scripts/train/pretrain.sh
Replace pretrained_model_path and output_dir with yours in scripts/train/finetune.sh
Adjust your GPU ids (localhost) and per_device_train_batch_size in scripts/train/pretrain.sh and scripts/train/finetune.sh

bash scripts/train/train_phi.sh

Important hyperparameters used in pretraining and finetuning are provided below.

Training Stage	Global Batch Size	Learning rate	conv_version
Pretraining	256	1e-3	pretrain
Finetuning	128	2e-5	phi

Tips:

Global Batch Size = num of GPUs * per_device_train_batch_size * gradient_accumulation_steps, we recommand you always keep global batch size and learning rate as above except for lora tuning your model.

conv_version is a hyperparameter used for choosing different chat templates for different LLMs. In the pretraining stage, conv_version is the same for all LLMs, using pretrain. In the finetuning stage, we use

phi for Phi-2, StableLM, Qwen-1.5

llama for TinyLlama, OpenELM

gemma for Gemma

3. Evaluation

Please refer to the Evaluation section in our Documenation.

Launch Demo Locally

If you want to launch the model trained by yourself or us locally, here's an example.

<details> <summary>Run inference with the model trained by yourself</summary>

from tinyllava.eval.run_tiny_llava import eval_model

model_path = "/absolute/path/to/your/model/"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"
conv_mode = "phi" # or llama, gemma, etc

args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "query": prompt,
    "conv_mode": conv_mode,
    "image_file": image_file,
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

eval_model(args)

"""
Output: 
XXXXXXXXXXXXXXXXX
"""

</details> <details> <summary>Run inference with the model trained by us using huggingface transformers</summary>

from transformers import AutoTokenizer, AutoModelForCausalLM

hf_path = 'tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B'
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
model.cuda()
config = model.config
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
prompt="What are these?"
image_url="http://images.cocodataset.org/val2017/000000039769.jpg"
output_text, genertaion_time = model.chat(prompt=prompt, image=image_url, tokenizer=tokenizer)

print('model output:', output_text)
print('runing time:', genertaion_time)

</details>