Home

Awesome

<h1 align="center">UCall</h1> <h3 align="center"> JSON Remote Procedure Calls Library<br/> Up to 100x Faster than FastAPI<br/> </h3> <br/> <p align="center"> <a href="https://discord.gg/xuDmpbEDnQ"><img height="25" src="https://github.com/unum-cloud/ukv/raw/main/assets/icons/discord.svg" alt="Discord"></a> &nbsp;&nbsp;&nbsp; <a href="https://www.linkedin.com/company/unum-cloud/"><img height="25" src="https://github.com/unum-cloud/ukv/raw/main/assets/icons/linkedin.svg" alt="LinkedIn"></a> &nbsp;&nbsp;&nbsp; <a href="https://twitter.com/unum_cloud"><img height="25" src="https://github.com/unum-cloud/ukv/raw/main/assets/icons/twitter.svg" alt="Twitter"></a> &nbsp;&nbsp;&nbsp; <a href="https://unum.cloud/post"><img height="25" src="https://github.com/unum-cloud/ukv/raw/main/assets/icons/blog.svg" alt="Blog"></a> &nbsp;&nbsp;&nbsp; <a href="https://github.com/unum-cloud/ucall"><img height="25" src="https://github.com/unum-cloud/ukv/raw/main/assets/icons/github.svg" alt="GitHub"></a> </p>

Most modern networking is built either on slow and ambiguous REST APIs or unnecessarily complex gRPC. FastAPI, for example, looks very approachable. We aim to be equally or even simpler to use.

<table width="100%"> <tr> <th width="50%">FastAPI</th><th width="50%">UCall</th> </tr> <tr> <td>
pip install fastapi uvicorn
</td> <td>
pip install ucall
</td> </tr> <tr> <td>
from fastapi import FastAPI
import uvicorn

server = FastAPI()

@server.get('/sum')
def sum(a: int, b: int):
    return a + b

uvicorn.run(...)    
</td> <td>
from ucall.posix import Server
# from ucall.uring import Server on 5.19+

server = Server()

@server
def sum(a: int, b: int):
    return a + b

server.run()    
</td> </tr> </table>

It takes over a millisecond to handle a trivial FastAPI call on a recent 8-core CPU. In that time, light could have traveled 300 km through optics to the neighboring city or country, in my case. How does UCall compare to FastAPI and gRPC?

Setup๐Ÿ”ServerLatency w 1 clientThroughput w 32 clients
Fast API over RESTโŒ๐Ÿ1'203 ฮผs3'184 rps
Fast API over WebSocketโœ…๐Ÿ86 ฮผs11'356 rps ยน
gRPC ยฒโœ…๐Ÿ164 ฮผs9'849 rps
UCall with POSIXโŒC62 ฮผs79'000 rps
UCall with io_uringโœ…๐Ÿ40 ฮผs210'000 rps
UCall with io_uringโœ…C22 ฮผs231'000 rps
<details> <summary>Table legend</summary>

All benchmarks were conducted on AWS on general purpose instances with Ubuntu 22.10 AMI. It is the first major AMI to come with Linux Kernel 5.19, featuring much wider io_uring support for networking operations. These specific numbers were obtained on c7g.metal beefy instances with Graviton 3 chips.

ยน FastAPI couldn't process concurrent requests with WebSockets.

ยฒ We tried generating C++ backends with gRPC, but its numbers, suspiciously, weren't better. There is also an async gRPC option, that wasn't tried.

</details>

How is that possible?!

How can a tiny pet-project with just a couple thousand lines of code compete with two of the most established networking libraries? UCall stands on the shoulders of Giants:

You have already seen the latency of the round trip..., the throughput in requests per second..., want to see the bandwidth? Try yourself!

@server
def echo(data: bytes):
    return data

More Functionality than FastAPI

FastAPI supports native type, while UCall supports numpy.ndarray, PIL.Image and other custom types. This comes handy when you build real applications or want to deploy Multi-Modal AI, like we do with UForm.

from ucall.rich_posix import Server
import ufrom

server = Server()
model = uform.get_model('unum-cloud/uform-vl-multilingual')

@server
def vectorize(description: str, photo: PIL.Image.Image) -> numpy.ndarray:
    image = model.preprocess_image(photo)
    tokens = model.preprocess_text(description)
    joint_embedding = model.encode_multimodal(image=image, text=tokens)

    return joint_embedding.cpu().detach().numpy()

We also have our own optional Client class that helps with those custom types.

from ucall.client import Client

client = Client()
# Explicit JSON-RPC call:
response = client({
    'method': 'vectorize',
    'params': {
        'description': description,
        'image': image,
    },
    'jsonrpc': '2.0',
    'id': 100,
})
# Or the same with syntactic sugar:
response = client.vectorize(description=description, image=image) 

CLI like cURL

Aside from the Python Client, we provide an easy-to-use Command Line Interface, which comes with pip install ucall. It allow you to call a remote server, upload files, with direct support for images and NumPy arrays. Translating previous example into a Bash script, to call the server on the same machine:

ucall vectorize description='Product description' -i image=./local/path.png

To address a remote server:

ucall vectorize description='Product description' -i image=./local/path.png --uri 0.0.0.0 -p 8545

To print the docs, use ucall -h:

usage: ucall [-h] [--uri URI] [--port PORT] [-f [FILE ...]] [-i [IMAGE ...]] [--positional [POSITIONAL ...]] method [kwargs ...]

UCall Client CLI

positional arguments:
  method                method name
  kwargs                method arguments

options:
  -h, --help            show this help message and exit
  --uri URI             server uri
  --port PORT           server port
  -f [FILE ...], --file [FILE ...]
                        method positional arguments
  -i [IMAGE ...], --image [IMAGE ...]
                        method positional arguments
  --positional [POSITIONAL ...]
                        method positional arguments

You can also explicitly annotate types, to distinguish integers, floats, and strings, to avoid ambiguity.

ucall auth id=256
ucall auth id:int=256
ucall auth id:str=256

Free Tier Throughput

We will leave bandwidth measurements to enthusiasts, but will share some more numbers. The general logic is that you can't squeeze high performance from Free-Tier machines. Currently AWS provides following options: t2.micro and t4g.small, on older Intel and newer Graviton 2 chips. This library is so fast, that it doesn't need more than 1 core, so you can run a fast server even on a tiny Free-Tier server!

Setup๐Ÿ”ServerClientst2.microt4g.small
Fast API over RESTโŒ๐Ÿ1328 rps424 rps
Fast API over WebSocketโœ…๐Ÿ11'504 rps3'051 rps
gRPCโœ…๐Ÿ11'169 rps1'974 rps
UCall with POSIXโŒC11'082 rps2'438 rps
UCall with io_uringโœ…C1-5'864 rps
UCall with POSIXโŒC323'399 rps39'877 rps
UCall with io_uringโœ…C32-88'455 rps

In this case, every server was bombarded by requests from 1 or a fleet of 32 other instances in the same availability zone. If you want to reproduce those benchmarks, check out the sum examples on GitHub.

Quick Start

For Python:

pip install ucall

For CMake projects:

include(FetchContent)
FetchContent_Declare(
    ucall
    GIT_REPOSITORY https://github.com/unum-cloud/ucall
    GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(ucall)
include_directories(${ucall_SOURCE_DIR}/include)

The C usage example is mouthful compared to Python. We wanted to make it as lightweight as possible and to allow optional arguments without dynamic allocations and named lookups. So unlike the Python layer, we expect the user to manually extract the arguments from the call context with ucall_param_named_i64(), and its siblings.

#include <cstdio.h>
#include <ucall/ucall.h>

static void sum(ucall_call_t call, ucall_callback_tag_t) {
    int64_t a{}, b{};
    char printed_sum[256]{};
    bool got_a = ucall_param_named_i64(call, "a", 0, &a);
    bool got_b = ucall_param_named_i64(call, "b", 0, &b);
    if (!got_a || !got_b)
        return ucall_call_reply_error_invalid_params(call);

    int len = snprintf(printed_sum, 256, "%ll", a + b);
    ucall_call_reply_content(call, printed_sum, len);
}

int main(int argc, char** argv) {

    ucall_server_t server{};
    ucall_config_t config{};

    ucall_init(&config, &server);
    ucall_add_procedure(server, "sum", &sum, NULL);
    ucall_take_calls(server, 0);
    ucall_free(server);
    return 0;
}

Roadmap

Want to affect the roadmap and request a feature? Join the discussions on Discord.

Why JSON-RPC?