Home

Awesome

BlockMerge Gradient (Tensors Edition)

Credit to TekVenom for the original concept!

This script allows you to merge two finetuned Llama 1/2 language models by blending their layers. This can be useful for creating ensembles of models or combining the strengths of two different models into a singular model. The merger is done based on a specified gradient between the two models.

Word of warning: Do not attempt to merge Llama 1 with Llama 2 models. It will work, but it'll result in a garbled mess.

Unless you have 128 GB RAM, this process will take up a lot of virtual memory. Spread your swapfile over multiple drives for optimal performance.

Usage

You can run the script using the command:

python BlockMerge_Gradient_Tensors.py --model_path1 /path/to/model1 --model_path2 /path/to/model2 --output_model_path /path/to/output --gradient_values '[1.0, 0.5, 0.0]' --max_shard_size '2000MiB' [--layer_only] [--no_layers]

Parameters:

Required:

Optional:


Gradient Values (gradient_values)

Definition:
The gradient_values parameter is a list of floats representing the blend ratio of how the tensors of the two models should be merged. The values typically range between 0.0 and 1.0, where:

Any value in between (e.g., 0.5) means a blend of both model1 and model2 for that particular tensor.

How It Works:
The list acts as a guide for how the blend ratio changes across the model's tensors. The script uses linear interpolation between the provided gradient values to generate a smooth gradient of blend ratios for all tensors in the model.

Example:
Suppose you provide the gradient values as [1.0, 0.5, 0.0]. This tells the script to start by blending tensors with 100% of model2's values, gradually transition to a 50-50 blend between the two models, and finally to use only model1's values.

Given this list, the script calculates the sections of tensors based on the gradient values. In this case, there are 3-1 = 2 sections. If there are, say, 24 tensors in the model:

So, the first tensor might be blended with 100% of model2's value, the sixth tensor might be blended with around 75% of model2's value (and 25% of model1), the twelfth tensor might be blended with 50% of each model, and so on.

Important Note:
The script assumes that the list's length is one less than the total number of tensors divided by the sections. Any remainder is adjusted by using the last gradient value.


Notes:


Example

python BlockMerge_Gradient_Tensors.py --model_path1 "stabilityai/StableBeluga-7B" --model_path2 "NousResearch/Nous-Hermes-Llama2-7b" --output_model_path "mythologic-mini-7b" --gradient_values "[0.9,0.0,0.0,0.0]" --layer_only