Home

Awesome

πŸŽ‰ Toolkit for Effective LLM Alignment

✨ Description and Features

This is a super customizable, concise, user-friendly, and efficient toolkit for training and aligning LLMs. To get started, simply define a YAML configuration with parameters from HF TrainingArguments and some specific parameters for each method.

πŸ› οΈ Toolkit Foundations

Key Libraries:

πŸ“š Supported Methods

πŸ“Œ Additional Features

SMPO - Simple Margin Preference Optimization

Our own alignment method designed for PO stability. The method is inspired by such methods as IPO, SimPO, C-RLFT, as well as introducing its own loss function of separating chosen and rejected pairs.

The main idea of the method is the desire to smoothly achieve the desired margin level without forcing the model to retrain by adding a balancing SFT loss on chosen and rejected at the same time.

The implementation of the method is here, and the config is here.

πŸš€ How to Use

πŸ“¦ Installation

πŸ› οΈ Project Installation

Run the following commands inside the project folder:

  1. Install Poetry:

    pip install poetry
    
  2. Install project dependencies:

    poetry install
    

    Verify with:

    poetry show
    
  3. (Optional) Set the environment variable HF_HOME to your desired folder:

    export HF_HOME=/mnt/hf/
    
  4. (Optional) Log in to Hugging Face CLI:

    poetry run huggingface-cli login
    
  5. (Optional) Log in to Weights & Biases:

    poetry run wandb login
    
  6. Check the configuration settings in the accelerate/ folder (number of GPUs, etc.).

βš™οΈ Prerequisites and Troubleshooting

First, make sure you have all the necessary developer Linux libraries installed, including GCC and G++ version 8 or higher. You can check this by running:

gcc --version

Next, ensure that CUDA is version 11.8 or higher (preferably 12.1) and that all your GPUs are detected. Use the command:

nvidia-smi

After completing the first step of installation, check that you have Poetry version 1.8+ installed, and it’s best to use Python 3.10. If not, update Poetry with:

poetry self update

and run:

poetry env use 3.10

After the second installation step, make sure that running poetry run ds_report returns meaningful text. Additionally, verify the version of Torch and the presence of NVIDIA packages.

If you encounter an error related to DeepSpeed and fused_adam during training, you need to remove DeepSpeed from your environment and install it with:

DS_BUILD_FUSED_ADAM=1 poetry run pip install deepspeed==0.14.5

πŸƒβ€β™‚οΈ Example of Running a Script from Config

You need to select a DeepSpeed config + a training config + the script itself. Here’s an example command to start SFT training using YAML config:


PYTHONPATH="${PYTHONPATH}:src/" poetry run accelerate launch --config_file accelerate/stage2_config.yaml scripts/sft.py training_configs/sft-llama-3.1-8b-it-lora-GrandmasterRAG-v1.yaml

πŸ“ YAML config examples

<details> <summary>SFT Training</summary>

Config for training SFT Llama 3.1, using Liger kernel, only assistant answers, modified chat template, LoRA, generating examples on eval.

model_name_or_path: "unsloth/Meta-Llama-3.1-8B-Instruct"
dataset:
  - "Vikhrmodels/GrandMaster-PRO-MAX"
  - "Vikhrmodels/Grounded-RAG-RU-v2"
train_only_on_completions: True
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
num_train_epochs: 1
save_strategy: "steps"
save_steps: 400
save_total_limit: 6
learning_rate: 0.00004
gradient_accumulation_steps: 8
gradient_checkpointing: True
logging_steps: 1
remove_unused_columns: False
dataloader_num_workers: 2
save_only_model: True
generate_eval_examples: True
use_liger: True
max_seq_length: 16000
evaluation_strategy: "steps"
eval_steps: 400
run_name: "sft-grndmrag-llama-3.1-unsloth-lora-256-qkvogudlm-v1"
output_dir: "/home/models/sft-grndmrag-llama-3.1-unsloth-lora-256-qkvogudlm-v1"
warmup_steps: 20
report_to: "wandb"
conversation_field: "conversation"
bf16: True
seed: 42
logging_first_step: True
use_peft: True
lora_target_modules:
  - "k_proj"
  - "v_proj"
  - "q_proj"
  - "o_proj"
  - "gate_proj"
  - "up_proj"
  - "down_proj"
  - "lm_head"
lora_r: 256
lora_alpha: 256
assistant_message_template: "<|start_header_id|>assistant<|end_header_id|>\n\n"
pad_token: "<|reserved_special_token_0|>"
eos_token: "<|eot_id|>"
chat_template: "{{ bos_token }}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
force_chat_template: True
</details>

Feel free to modify any part of this translation! 😊