AI00 RWKV Server is an inference API server for the RWKV language model based upon the web-rwkv inference engine.

It supports Vulkan parallel and concurrent batched inference and can run on all GPUs that support Vulkan. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!

No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!

Compatible with OpenAI's ChatGPT API interface.

100% open source and commercially usable, under the MIT license.

If you are looking for a fast, efficient, and easy-to-use LLM API server, then AI00 RWKV Server is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.

Join the AI00 RWKV Server community now and experience the charm of AI!

QQ Group for communication: 30920262




Installation, Compilation, and Usage

📦Download Pre-built Executables

  1. Directly download the latest version from Release

  2. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st

  3. Optionally modify assets/Config.toml for model configurations like model path, quantization layers, etc.

  4. Run in the command line

    $ ./ai00_rwkv_server
  5. Open the browser and visit the WebUI at http://localhost:65530 (https://localhost:65530 if tls is enabled)

📜(Optional) Build from Source

  1. Install Rust

  2. Clone this repository

    $ git clone https://github.com/cgisky1980/ai00_rwkv_server.git
    $ cd ai00_rwkv_server
  3. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st

  4. Compile

    $ cargo build --release
  5. After compilation, run

    $ cargo run --release
  6. Open the browser and visit the WebUI at http://localhost:65530 (https://localhost:65530 if tls is enabled)

📒Convert the Model

It only supports Safetensors models with the .st extension now. Models saved with the .pth extension using torch need to be converted before use.

  1. Download the .pth model

  2. (Recommended) Run the python script convert2ai00.py or convert_safetensors.py:

    $ python ./convert2ai00.py --input /path/to/model.pth --output /path/to/model.st

    Requirements: Python, with torch and safetensors installed.

  3. If you do not want to install python, In the Release you could find an executable called converter. Run

$ ./converter --input /path/to/model.pth --output /path/to/model.st
  1. If you are building from source, run
$ cargo run --release --package converter -- --input /path/to/model.pth --output /path/to/model.st
  1. Just like the steps mentioned above, place the model in the .st model in the assets/models/ path and modify the model path in assets/Config.toml

📝Supported Arguments

📙Currently Available APIs

The API service starts at port 65530, and the data input and output format follow the Openai API specification. Note that some APIs like chat and completions have additional optional fields for advanced functionalities. Visit http://localhost:65530/api-docs for API schema.

The following is an out-of-box example of Ai00 API invocations in Python:

import openai

class Ai00:
    def __init__(self,model="model",port=65530,api_key="JUSTSECRET_KEY") :
        openai.api_base = f"{port}/api/oai"
        openai.api_key = api_key
        self.ctx = []
        self.params = {
            "system_name": "System",
            "user_name": "User", 
            "assistant_name": "Assistant",
            "model": model,
            "max_tokens": 4096,
            "top_p": 0.6,
            "temperature": 1,
            "presence_penalty": 0.3,
            "frequency_penalty": 0.3,
            "half_life": 400,
            "stop": ['\x00','\n\n']
    def set_params(self,**kwargs):
    def clear_ctx(self):
        self.ctx = []
    def get_ctx(self):
        return self.ctx
    def continuation(self, message):
        response = openai.Completion.create(
        result = response.choices[0].text
        return result
    def append_ctx(self,role,content):
            "role": role,
            "content": content
    def send_message(self, message,role="user"):
            "role": role,
            "content": message
        result = openai.ChatCompletion.create(
                "system": self.params['system_name'],
                "user": self.params['user_name'],
                "assistant": self.params['assistant_name']
        result = result.choices[0].message['content']
            "role": "assistant",
            "content": result
        return result
ai00 = Ai00()
    max_tokens = 4096,
    top_p = 0.55,
    temperature = 2,
    presence_penalty = 0.3,
    frequency_penalty = 0.8,
    half_life = 400,
    stop = ['\x00','\n\n']
print(ai00.send_message("how are you?"))
print(ai00.send_message("me too!"))
print(ai00.continuation("i like"))

BNF Sampling

Since v0.5, Ai00 has a unique feature called BNF sampling. BNF forces the model to output in specified formats (e.g., JSON or markdown with specified fields) by limiting the possible next tokens the model can choose from.

Here is an example BNF for JSON with fields "name", "age" and "job":

<start> ::= <json_object>
<json_object> ::= "{" <object_members> "}"
<object_members> ::= <json_member> | <json_member> ", " <object_members>
<json_member> ::= <json_key> ": " <json_value>
<json_key> ::= '"' "name" '"' | '"' "age" '"' | '"' "job" '"'
<json_value> ::= <json_string> | <json_number>
<json_number> ::= <positive_digit><digits>|'0'
<image src="img/bnf.png" />

📙WebUI Screenshots


<image src="img/chat_en.gif" />


<image src="img/continuation_en.gif" />

Paper (Parallel Inference Demo)

<image src="img/paper_en.gif" />

📝TODO List

👥Join Us

We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!

No matter your skill level, we welcome you to join us. You can join us in the following ways:

We can't wait to work with you to make this project better! We hope the project is helpful to you!


Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project!

