Home

Awesome

LLMHub - Proxy Server and Model Management Framework

This project is a flexible and extensible proxy server and model management framework designed for routing, logging, and managing AI/ML model requests. It supports dynamic model discovery, multi-engine orchestration, and robust logging capabilities.

The heart of the project is taking advantage of the shared memory model of OSX, allowing us to load the same model multiple times with no double-loading memory penalty. This allows us to run multiple of the same model in parallel with minimal extra overhead, meaning we are able to run the same model with mulitple context sizes, and only incur the pentalty for a larger context window when it is necessary.

Installation

npm install node-llmhub

Configuration

Example config.yaml

on_start:
  TheBloke/MythoMax-L2-13B-GGUF:
    quant: Q5_K_M
    engine: llamacppserver
    context_sizes: [512, 2048, 4196]

proxy_port: 8080
enable_proxy: true
engine_port_min: 8081
engine_port_max: 10000

engines:
  llamacppserver:
    path: /Users/user/llama.cpp/llama-server
    arguments: --color -t 20 --parallel 2 --mlock --metrics --verbose
    model_flag: "-m"
    context_size_flag: "-c"
    port_flag: "--port"
    api_key_flag: "--api-key"
    file_types: [ gguf ]  # This engine only supports gguf files

Usage

To start services:

llmhub start

To get the status:

llmhub status

To stop services:

llmhub stop

Data storage

We save state information in ~/.llmhub.

Features


Project Structure

Directories and Files


Installation

  1. Clone the Repository:

    git clone https://github.com/your-repo/proxy-server.git
    cd proxy-server
    
  2. Install Dependencies:

    npm install
    
  3. Set Up Configuration:

    • Edit config.yaml to define your models, engines, and proxy settings.
    • Use config-sample.yaml as a reference.

Usage

Start the Proxy Server

Run the following command to start the proxy server on the configured port:

node proxy/proxyServer.js <port>

CLI Commands

Use the CLI tool (cli.js) for server and model management:


Configuration

Example config.yaml

on_start:
  TheBloke/MythoMax-L2-13B-GGUF:
    quant: Q5_K_M
    engine: llamacppserver
    context_sizes: [512, 2048, 4196]

proxy_port: 8080
enable_proxy: true
engine_port_min: 8081
engine_port_max: 10000

engines:
  llamacppserver:
    path: /path/to/llama-server
    arguments: --color -t 20 --parallel 2 --mlock --metrics --verbose
    model_flag: "-m"
    context_size_flag: "-c"
    port_flag: "--port"
    api_key_flag: "--api-key"
    file_types: [gguf]

Key Configuration Parameters


Development and Testing

Run Test Scripts

Test proxies with example scripts:

node misc/test-req-body-proxy.js
node misc/test-proxy.js

Logging

Logs are saved to the following default locations:


Contributions

Contributions are welcome! Please submit issues or pull requests on the GitHub repository.