Home

Awesome

<h1 align="center"> <p><img src="assert/logo.jpg" alt="RWKV-PEFT" width="60px" style="vertical-align: middle; margin-right: 10px;"/>RWKV-PEFT</p> </h1>

[ English | 中文 ]

RWKV-PEFT is the official implementation for efficient parameter fine-tuning of RWKV5/6 models, supporting various advanced fine-tuning methods across multiple hardware platforms.

Installation

[!IMPORTANT] Installation is mandatory.

git clone https://github.com/JL-er/RWKV-PEFT.git
cd RWKV-PEFT
pip install -r requirements.txt

Web Run

[!TIP] If you are using a cloud server (such as Vast or AutoDL), you can start the Streamlit service by referring to the help documentation on the cloud server's official website.

streamlit run web/app.py

Table of Contents

Hardware Requirements

The following shows memory usage when using an RTX 4090 (24GB VRAM) + 64GB RAM (with parameters: --strategy deepspeed_stage_1 --ctx_len 1024 --micro_bsz 1 --lora_r 64):

Model SizeFull FinetuningLoRA/PISSAQLoRA/QPISSAState Tuning
RWKV6-1.6BOOM7.4GB5.6GB6.4GB
RWKV6-3BOOM12.1GB8.2GB9.4GB
RWKV6-7BOOM23.7GB*14.9GB**18.1GB

Note:

Quick Start

  1. Install dependencies:
pip install -r requirements.txt
  1. Run example script:
sh scripts/run_lora.sh

Note: Please refer to the RWKV official tutorial for detailed data preparation

  1. Start with web GUI:

[!TIP] If you're using cloud services (such as Vast or AutoDL), you'll need to enable web port access according to your service provider's instructions.

streamlit run web/app.py

Main Features

Detailed Configuration

1. PEFT Method Selection

--peft bone --bone_config $lora_config

2. Training Parts Selection

--train_parts ["time", "ln"]

3. Quantized Training

--quant int8/nf4

4. Infinite Length Training (infctx)

--train_type infctx --chunk_ctx 512 --ctx_len 2048

5. Data Loading Strategy

--dataload pad

6. DeepSpeed Strategy

--strategy deepspeed_stage_1

Available strategies:

7. FLA Operator

By default, RWKV-PEFT uses custom CUDA kernels for wkv computation. However, you can use --fla to enable the Triton kernel:

--fla

GPU Support

Citation

If you find this project helpful, please cite our work:

@misc{kang2024boneblockaffinetransformation,
      title={Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models},
      author={Jiale Kang},
      year={2024},
      eprint={2409.15371},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.15371}
}