Awesome
Faster RWKV
CUDA
Convert Model
-
Generate a ChatRWKV weight file by
v2/convert_model.py
(in ChatRWKV repo) and strategycuda fp16
. -
Generate a faster-rwkv weight file by
tools/convert_weight.py
. For example,python3 tools/convert_weight.py RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096-converted-fp16.pth rwkv-4-1.5b-chntuned-fp16.fr
.
Build
mkdir build
cd build
cmake -DFR_ENABLE_CUDA=ON -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
Run
./chat tokenizer_file_path weight_file_path "cuda fp16"
For example, ./chat ../tokenizer_model ../rwkv-4-1.5b-chntuned-fp16.fr "cuda fp16"
Android
Convert Model
-
Generate a ChatRWKV weight file by
v2/convert_model.py
(in ChatRWKV repo) and strategycuda fp32
orcpu fp32
. Note that though we use fp32 here, the real dtype is determined is the following step. -
Generate a faster-rwkv weight file by
tools/convert_weight.py
. -
Export ncnn model by
./export_ncnn <input_faster_rwkv_model_path> <output_path_prefix>
. You can download pre-builtexport_ncnn
from Releases if you are a Linux users, or build it by yourself.
Build
Android App Development
Download the pre-built Android AAR library from Releases, or run the aar/build_aar.sh
to build it by yourself.
Android C++ Development
For the path of Android NDK and toolchain file, please refer to Android NDK docs.
mkdir build
cd build
cmake -DFR_ENABLE_NCNN=ON -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DANDROID_NDK=xxxx -DCMAKE_TOOLCHAIN_FILE=xxxx -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja
Run in Termux (Ignore it if you are an app developer)
-
Copy
chat
into the Android phone (by using adb or Termux). -
Copy the tokenizer_model and the ncnn models (.param, .bin and .config) into the Android phone (by using adb or Termux).
-
Run
./chat tokenizer_model ncnn_models_basename "ncnn fp16"
in adb shell or Termux, for example, if the ncnn models are namedrwkv-4-chntuned-1.5b.param
,rwkv-4-chntuned-1.5b.bin
andrwkv-4-chntuned-1.5b.config
, the command should be./chat tokenizer_model rwkv-4-chntuned-1.5b "ncnn fp16"
.
Requirements
-
Android System >= 9.0
-
RAM >= 4GB (for 1.5B model)
-
No hard requirement for CPU. More powerful = faster.
Android Demo
Run one of the following commands in Termux to download prebuilt executables and models automatically. The download script supports continuely downloading partially downloaded files, so feel free to Ctrl-C and restart it if the speed is too slow.
Executables, 1.5B CHNtuned int8 model, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 3
Executables, 1.5B CHNtuned int4 model and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 2
Executables and 0.1B world int8 model:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 1
Executables only:
curl -L -s https://raw.githubusercontent.com/daquexian/faster-rwkv/master/download_binaries_and_models_termux.sh | bash -s 0
Export ONNX
-
Install
rwkv2onnx
python package bypip install rwkv2onnx
. -
Run
rwkv2onnx <input path> <output path> <ChatRWKV path>
. For example,rwkv2onnx ~/RWKV-5-World-0.1B-v1-20230803-ctx4096.pth ~/RWKV-5-0.1B.onnx ~/ChatRWKV
TODO
- JNI
- v5 models support (models are published at https://huggingface.co/daquexian/fr-models/tree/main)
- ABC music models support (models are published at https://huggingface.co/daquexian/fr-models/tree/main)
- CI
- ARM NEON int8 (~2x speedup compared to fp16)
- ARM NEON int4 (>2x speedup compared to fp16)
- MIDI music models support
- custom initial state support
- export ONNX
- seq mode
- CUDA
- Others
- Raven models support
- more backends..
- simplify model convertion