Home

Awesome

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)

<p align="center"> <br> <img src="resources/banner.png"/> <br> <p> <p align="center"> <a href="https://modelscope.cn/home">ModelScope Community Website</a> <br> <a href="README_CN.md">中文</a> &nbsp | &nbsp English &nbsp </p> <p align="center"> <img src="https://img.shields.io/badge/python-%E2%89%A53.8-5be.svg"> <img src="https://img.shields.io/badge/pytorch-%E2%89%A51.12%20%7C%20%E2%89%A52.0-orange.svg"> <a href="https://github.com/modelscope/modelscope/"><img src="https://img.shields.io/badge/modelscope-%E2%89%A51.17-5D91D4.svg"></a> <a href="https://pypi.org/project/ms-swift/"><img src="https://badge.fury.io/py/ms-swift.svg"></a> <a href="https://github.com/modelscope/swift/blob/main/LICENSE"><img src="https://img.shields.io/github/license/modelscope/swift"></a> <a href="https://pepy.tech/project/ms-swift"><img src="https://pepy.tech/badge/ms-swift"></a> <a href="https://github.com/modelscope/swift/pulls"><img src="https://img.shields.io/badge/PR-welcome-55EB99.svg"></a> </p> <p align="center"> <a href="https://trendshift.io/repositories/6427" target="_blank"><img src="https://trendshift.io/api/badge/repositories/6427" alt="modelscope%2Fswift | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>

📖 Table of Contents

📝 Introduction

SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of 350+ LLMs and 100+ MLLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by PEFT, we also provide a complete Adapters library to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.

To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. SWIFT web-ui is available both on Huggingface space and ModelScope studio, please feel free to try!

SWIFT has rich documentations for users, please feel free to check our documentation website:

<p align="center"> <a href="https://arxiv.org/abs/2408.05517">Paper</a> &nbsp | <a href="https://swift.readthedocs.io/en/latest/">English Documentation</a> &nbsp | &nbsp <a href="https://swift.readthedocs.io/zh-cn/latest/">中文文档</a> &nbsp </p>

☎ Groups

You can contact us and communicate with us by adding our group:

Discord Group微信群
<img src="asset/discord_qr.jpg" width="200" height="200"><img src="asset/wechat.png" width="200" height="200">

🎉 News

<details><summary>More</summary> </details>

🛠️ Installation

SWIFT runs in the Python environment. Please ensure your Python version is higher than 3.8.

# Full capabilities
pip install 'ms-swift[all]' -U
# LLM only
pip install 'ms-swift[llm]' -U
# AIGC only
pip install 'ms-swift[aigc]' -U
# Adapters only
pip install ms-swift -U
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

SWIFT depends on torch>=1.13, recommend torch>=2.0.0.

🚀 Getting Started

This section introduces basic usage, see the Documentation section for more ways to use.

Web-UI

Web-UI is a gradio-based interface for zero-threshold training and deployment. It is easy to use and perfectly supports multi-GPU training and deployment:

SWIFT_UI_LANG=en swift web-ui

image.png

Training

Training Scripts

You can refer to the following scripts to customize your own training script.

Supported Training Processes

Training ProcessTraining Method
PretrainingText Generation
Fine-tuningSingle-turn/Multi-turn<br>Agent Training/Self-cognition<br>Multi-modal Vision/Multi-modal Speech
Human AlignmentDPO<br>ORPO<br>SimPO<br>CPO<br>KTO
Text-to-ImageDreamBooth, etc.
Text-to-Video-

Single GPU Training

Start single GPU fine-tuning with the following command:

LoRA:

# Experimental Environment: A100
# GPU Memory Requirement: 20GB
# Runtime: 3.1 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --eval_steps 200 \

Full-parameter:

# Experimental Environment: A100
# GPU Memory Requirement: 80GB
# Runtime: 2.5 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type full \
    --output_dir output \
    --eval_steps 500 \

Model Parallel Training

# Experimental Environment: 2 * A100
# GPU Memory Requirement: 10GB + 13GB
# Runtime: 3.4 hours
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Data Parallel Training

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 30GB
# Runtime: 0.8 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Combining Model Parallelism and Data Parallelism:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 2*14GB + 2*18GB
# Runtime: 1.7 hours
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Deepspeed Training

Deepspeed supports training of quantized GPTQ and AWQ models.

ZeRO2:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 21GB
# Runtime: 0.9 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed default-zero2 \

ZeRO3:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 19GB
# Runtime: 3.2 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed default-zero3 \

ZeRO3-Offload:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 12GB
# Runtime: 60 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_id_or_path AI-ModelScope/WizardLM-2-8x22B \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed zero3-offload \

Multi-node Multi-GPU

# If the disk is not shared, please additionally specify `--save_on_each_node true` in the shell scripts on each machine.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=2 \
NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
NPROC_PER_NODE=8 \
swift sft \
    --model_type qwen1half-32b-chat \
    --sft_type full \
    --dataset blossom-math-zh \
    --output_dir output \
    --deepspeed default-zero3 \

# node1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=2 \
NODE_RANK=1 \
MASTER_ADDR=xxx.xxx.xxx.xxx \
NPROC_PER_NODE=8 \
swift sft \
    --model_type qwen1half-32b-chat \
    --sft_type full \
    --dataset blossom-math-zh \
    --output_dir output \
    --deepspeed default-zero3 \
AliYun-DLC multi-node training

In DLC product, WORLD_SIZE is the node number, RANK is the node index, this is different from the definition of torchrun.

NNODES=$WORLD_SIZE \
NODE_RANK=$RANK \
swift sft \
    --model_type qwen1half-32b-chat \
    --sft_type full \
    --dataset blossom-math-zh \
    --output_dir output \
    --deepspeed default-zero3

Pretraining

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 30GB
# Runtime: 0.8 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift pt \
    --model_type qwen1half-7b \
    --dataset chinese-c4#100000 \
    --num_train_epochs 1 \
    --sft_type full \
    --deepspeed default-zero3 \
    --output_dir output \
    --lazy_tokenize true

RLHF

# We support rlhf_type dpo/cpo/simpo/orpo/kto
CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type dpo \
    --model_type qwen1half-7b-chat \
    --dataset shareai-llama3-dpo-zh-en-emoji \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Inference

Original model:

CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat \
    --infer_backend vllm --max_model_len 8192

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
    --merge_lora true --infer_backend vllm --max_model_len 8192

Evaluation

Original model:

CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen1half-7b-chat \
    --eval_dataset ARC_c --infer_backend vllm

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift eval --ckpt_dir xxx/checkpoint-xxx \
    --eval_dataset ARC_c --infer_backend vllm \
    --merge_lora true \

Quantization

Original model:

CUDA_VISIBLE_DEVICES=0 swift export --model_type qwen1half-7b-chat \
    --quant_bits 4 --quant_method awq

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift export \
    --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
    --quant_method awq --quant_bits 4 \
    --merge_lora true \

Deployment

The client uses the OpenAI API for invocation, for details refer to the LLM deployment documentation.

Original model:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat \
    --infer_backend vllm --max_model_len 8192

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir xxx/checkpoint-xxx
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy \
    --ckpt_dir xxx/checkpoint-xxx --merge_lora true \
    --infer_backend vllm --max_model_len 8192

Supported Models

The complete list of supported models and datasets can be found at Supported Models and Datasets List.

LLMs

Model TypeModel IntroductionLanguageModel SizeModel Type
Qwen<br>Qwen1.5<br>Qwen2<br>Qwen2.5Tongyi Qwen series modelsChinese<br>English0.5B-110B<br>including quantized versionsbase model<br>chat model<br>MoE model<br>code model
ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4<br>Codegeex4Zhipu ChatGLM series modelsChinese<br>English6B-9Bbase model<br>chat model<br>code model<br>long text model
Baichuan<br>Baichuan2Baichuan 1 and Baichuan 2Chinese<br>English7B-13B<br>including quantized versionsbase model<br>chat model
Yuan2Langchao Yuan series modelsChinese<br>English2B-102Binstruct model
XVerseXVerse series modelsChinese<br>English7B-65Bbase model<br>chat model<br>long text model<br>MoE model
LLaMA2LLaMA2 series modelsEnglish7B-70B<br>including quantized versionsbase model<br>chat model
LLaMA3<br>LLaMA3.1<br>Llama3.2LLaMA3 series modelsEnglish1B-70B<br>including quantized versionsbase model<br>chat model
Mistral<br>MixtralMistral series modelsEnglish7B-22Bbase model<br>instruct model<br>MoE model
Yi<br>Yi1.5<br>Yi-Coder01AI's YI series modelsChinese<br>English1.5B-34B<br>including quantizedbase model<br>chat model<br>long text model
InternLM<br>InternLM2<br>InternLM2-Math<br>InternLM2.5Pujiang AI Lab InternLM series modelsChinese<br>English1.8B-20Bbase model<br>chat model<br>math model
DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2DeepSeek series modelsChinese<br>English1.3B-236Bbase model<br>chat model<br>MoE model<br>code model<br>math model
MAMBAMAMBA temporal convolution modelEnglish130M-2.8Bbase model
Gemma<br>Gemma2Google Gemma series modelsEnglish2B-27Bbase model<br>instruct model
MiniCPM<br>MiniCPM3OpenBmB MiniCPM series modelsChinese<br>English2B-3Bchat model<br>MoE model
OpenBuddyOpenBuddy series modelsChinese<br>English7B-70Bbase model<br>chat model
OrionOrionStar AI series modelsChinese<br>English14Bbase model<br>chat model
BlueLMVIVO BlueLM large modelChinese<br>English7Bbase model<br>chat model
Ziya2Fengshenbang series modelsChinese<br>English13Bbase model<br>chat model
SkyworkSkywork series modelsChinese<br>English13Bbase model<br>chat model
ZephyrZephyr series models based on MistralEnglish7Bchat model
PolyLMTongyi Lab self-developed PolyLM series modelsMultilingual13Bbase model
SeqGPTTongyi Lab self-developed text understanding model for information extraction and text classificationChinese560Msemantic understanding model
SUSSouthern University of Science and Technology model fine-tuned on YIChinese<br>English34Bchat model
Tongyi-FinanceTongyi finance series modelsChinese<br>English14Bbase model<br>chat model<br>financial model
CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-QwenAnt CodeFuse series modelsChinese<br>English6B-34Bchat model<br>code model
phi2/phi3Microsoft's PHI series modelsEnglish3B/4Bbase model<br>instruct model<br>code model
GrokX-aiEnglish300Bbase model
TeleChatTele-AIChinese<br>English7B-12Bchat model
dbrxdatabricksEnglish132Bbase model<br>chat model
mengzi3LangboatChinese<br>English13Bbase model
c4ai-command-rc4aiMultilingual35B-104Bchat model
aya-expanseayaMultilingual8B-32Bchat model
WizardLM2WizardLM2 series modelsEnglish7B-8x22B<br>including quantized versionschat model<br>MoE model
AtomAtomChinese7Bbase model<br>chat model
Chinese-LLaMA-Alpaca-2Chinese-LLaMA-Alpaca-2Chinese1.3B-13Bbase model<br>chat model<br>long text model
Chinese-LLaMA-Alpaca-3Chinese-LLaMA-Alpaca-3Chinese8Bbase model<br>chat model
ModelScope-AgentModelScope Agent series modelsChinese7B-14Bagent model
NuminaAI-MOEnglish7BMath

MLLMs

Model TypeModel IntroductionLanguageModel SizeModel Type
Qwen-VL<br>Qwen2-VLTongyi Qwen vision modelChinese<br>English2B-72B<br>including quantized versionsbase model<br>chat model
Qwen-Audio<br>Qwen2-AudioTongyi Qwen speech modelChinese<br>English7Bbase model<br>chat model
Llama3.2-VisionLlama3.2English11B-90Bbase model<br>chat model
YI-VL01AI's YI series vision modelsChinese<br>English6B-34Bchat model
XComposer2<br>XComposer2.5Pujiang AI Lab InternLM vision modelChinese<br>English7Bchat model
DeepSeek-VL<br>Deepseek-JanusDeepSeek series vision modelsChinese<br>English1.3B-7Bchat model
MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2.5<br>MiniCPM-V-2.6OpenBmB MiniCPM vision modelChinese<br>English3B-9Bchat model
CogVLM<br>CogAgent<br>CogVLM2<br>CogVLM2-Video<br>GLM4VZhipu ChatGLM visual QA and Agent modelChinese<br>English9B-19Bchat model
Llava-HFLlava-HF series modelsEnglish0.5B-110Bchat model
Llava1.5<br>Llava1.6Llava series modelsEnglish7B-34Bchat model
Llava-Next<br>Llava-Next-VideoLlava-Next series modelsChinese<br>English7B-110Bchat model
mPLUG-Owl2<br>mPLUG-Owl2.1<br>mPLUG-Owl3mPLUG-Owl series modelsEnglish1B-11Bchat model
InternVL<br>Mini-InternVL<br>InternVL2InternVLChinese<br>English1B-40B<br>including quantized versionchat model
Llava-llama3xtunerEnglish8Bchat model
Phi3-VisionMicrosoftEnglish4Bchat model
PaliGemmaGoogleEnglish3Bchat model
FlorenceMicrosoftEnglish0.23B-0.77Bchat model
Idefics3HuggingFaceM4English8Bchat model
PixtralmistralaiEnglish12Bchat model
Llama3.1-OmniLLaMA-OmniEnglish8Bchat model
OvisOvisEnglish9Bchat model
MolmoMolmo series modelsEnglish1B-72Bchat model
Emu3-ChatEmu3-ChatEnglish8Bchat model

Diffusion Models

Model TypeModel IntroductionLanguageModel Type
AnimateDiffAnimateDiff animation modelEnglishtext-to-video
SD1.5/SD2.0/SDXLStabilityAI series diffusion modelsEnglishtext-to-image

Supported Open Source Datasets

Dataset TypeTraining TaskDataset
GeneralFine-tuning🔥ruozhiba, 🔥ms-bench, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca, instinwild, cot-en, cot-zh, firefly-zh, instruct-en, gpt4all-en, sharegpt, tulu-v2-sft-mixture, wikipedia-zh, open-orca, sharegpt-gpt4, deepctrl-sft, coig-cqia.
AgentFine-tuning🔥ms-agent, 🔥ms-agent-for-agentfabric, ms-agent-multirole, 🔥toolbench-for-alpha-umi, damo-agent-zh, damo-agent-zh-mini, agent-instruct-all-en.
GeneralHuman Alignmenthh-rlhf, 🔥hh-rlhf-cn, stack-exchange-paired.
CodeFine-tuningcode-alpaca-en, 🔥leetcode-python-en, 🔥codefuse-python-en, 🔥codefuse-evol-instruction-zh.
MedicalFine-tuningmedical-en, medical-zh, 🔥disc-med-sft-zh.
LegalFine-tuninglawyer-llama-zh, tigerbot-law-zh, 🔥disc-law-sft-zh.
MathFine-tuning🔥blossom-math-zh, school-math-zh, open-platypus-en.
SQLFine-tuningtext2sql-en, 🔥sql-create-context-en.
Text GenerationFine-tuning🔥advertise-gen-zh, 🔥dureader-robust-zh.
ClassificationFine-tuningcmnli-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en.
Quantization AssistQuantizationpileval.
OtherFine-tuningfinance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh.
VisionFine-tuningcoco-en, 🔥coco-en-mini, coco-en-2, coco-en-2-mini, capcha-images.
AudioFine-tuningaishell1-zh, 🔥aishell1-zh-mini.

Supported Technologies

Technology Name
🔥LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
🔥LoRA+: LoRA+: Efficient Low Rank Adaptation of Large Models
🔥GaLore:GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
🔥LISA: LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
🔥UnSloth: https://github.com/unslothai/unsloth
🔥LLaMA PRO: LLAMA PRO: Progressive LLaMA with Block Expansion
🔥SCEdit: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing < arXiv \
🔥NEFTune: Noisy Embeddings Improve Instruction Finetuning
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Adapter: Parameter-Efficient Transfer Learning for NLP
Vision Prompt Tuning: Visual Prompt Tuning
Side: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
Res-Tuning: Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone < arXiv \
Tuners provided by PEFT, such as IA3, AdaLoRA, etc.

Supported Hardware

Hardware EnvironmentNotes
CPU
RTX 20/30/40 series, etc.After 30 series, BF16 and FlashAttn can be used
Computing cards T4/V100, etc.BF16 and FlashAttn not supported
Computing cards A10/A100, etc.Support BF16 and FlashAttn
Huawei Ascend NPU

Environment variables

Other variables like CUDA_VISIBLE_DEVICES are also supported, which are not listed here.

📚 Classroom

Tutorial Name
Introduction to Deep Learning
Large Model Basics
Prompt Engineering
Transformer Architecture Introduction
Training Technique Selection
Data Preprocessing
Quantization
Training
Inference
Deployment
Evaluation

🏛 License

This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.

📎 Citation

@misc{zhao2024swiftascalablelightweightinfrastructure,
      title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning},
      author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen},
      year={2024},
      eprint={2408.05517},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.05517},
}

Star History

Star History Chart