Home

Awesome

<h1 align="center"> <p><img src="assert/logo.jpg" alt="RWKV-PEFT" width="60px" style="vertical-align: middle; margin-right: 10px;"/>RWKV-PEFT</p> </h1>

[ English | 中文 ]

RWKV-PEFT is the official implementation for efficient parameter fine-tuning of RWKV5/6 models, supporting various advanced fine-tuning methods across multiple hardware platforms.

Recent updates

Support v7

--my_testing "x070"

SFT

Relevant parameters, detailed usage reference: scripts/run_sft.sh

--data_type sft --sft_field query response --sft_split "train"

Specific settings for SFT

RWKV-PEFT/src/rwkv_datasets/SFTdataset.py

tokenizer_path = 'RWKV/rwkv-5-world-3b' #Choose a tokenizer (select the official tokenizer)
IGNORE_INDEX = -100 #Padding (do not modify)
EOT_TOKEN = "<|EOT|>" #Set the stop token(s) you need

# Modify the corresponding prompt according to your requirements
PROMPT = (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{instruction}\n\n### Response:"
    )

[!TIP] Downloading Hugging Face data may time out in China, so you need to add:
HF_ENDPOINT="https://hf-mirror.com" sh scripts/run_sft.sh

Bone: Block-Affine Adaptation of Large Language Models Paper

The paper has been updated. Bone is now a simple and efficient basic PEFT method that is faster and uses less VRAM than LoRA, converges faster, and performs better than PiSSA. The previous version of Bone has been changed to the Bat method.
scripts:
bone_config='{"bone_load":"","bone_r":64}'updatebone_config='{"bone_mode":"bone","bone_load":"","bone_r":64}' orbone_config='{"bone_mode":"bat","bone_load":"","bone_r":64}'

Installation

[!IMPORTANT] Installation is mandatory.

git clone https://github.com/JL-er/RWKV-PEFT.git
cd RWKV-PEFT
pip install -r requirements.txt

Web Run

[!TIP] If you are using a cloud server (such as Vast or AutoDL), you can start the Streamlit service by referring to the help documentation on the cloud server's official website.

gradio web/app.py

Table of Contents

Hardware Requirements

The following shows memory usage when using an RTX 4090 (24GB VRAM) + 64GB RAM (with parameters: --strategy deepspeed_stage_1 --ctx_len 1024 --micro_bsz 1 --lora_r 64):

Model SizeFull FinetuningLoRA/PISSAQLoRA/QPISSAState Tuning
RWKV6-1.6BOOM7.4GB5.6GB6.4GB
RWKV6-3BOOM12.1GB8.2GB9.4GB
RWKV6-7BOOM23.7GB*14.9GB**18.1GB

Note:

Quick Start

  1. Install dependencies:
pip install -r requirements.txt
  1. Run example script:
sh scripts/run_lora.sh

Note: Please refer to the RWKV official tutorial for detailed data preparation

  1. Start with web GUI:

[!TIP] If you're using cloud services (such as Vast or AutoDL), you'll need to enable web port access according to your service provider's instructions.

streamlit run web/app.py

Main Features

Detailed Configuration

1. PEFT Method Selection

--peft bone --bone_config $lora_config

2. Training Parts Selection

--train_parts ["time", "ln"]

3. Quantized Training

--quant int8/nf4

4. Infinite Length Training (infctx)

--train_type infctx --chunk_ctx 512 --ctx_len 2048

5. Data Loading Strategy

--dataload pad

6. DeepSpeed Strategy

--strategy deepspeed_stage_1

Available strategies:

7. FLA Operator

By default, RWKV-PEFT uses custom CUDA kernels for wkv computation. However, you can use --fla to enable the Triton kernel:

--fla

GPU Support

Citation

If you find this project helpful, please cite our work:

@misc{kang2024boneblockaffineadaptationlarge,
      title={Bone: Block-Affine Adaptation of Large Language Models}, 
      author={Jiale Kang},
      year={2024},
      eprint={2409.15371},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.15371}, 
}