Home

Awesome

gen_rpc: A scalable RPC library for Erlang-VM based languages

Overview

Build Dependencies

To build this project you need to have the following:

Usage

Getting started with gen_rpc is easy. First, add the appropriate dependency line to your rebar.config:

{deps, [
    {gen_rpc, {git, "https://github.com/priestjim/gen_rpc.git", {branch, "master"}}}
]}.

Or if you're using hex.pm/rebar3:

{deps [
    {gen_rpc, "~> 2.0"}
]}.

Or if you're using Elixir/Mix:

def project do
  [
    deps: [
      {:gen_rpc, "~> 2.0"}
    ]
  ]

Then, add gen_rpc as a dependency application to your .app.src/.app file:

{application, my_app, [
    {applications, [kernel, stdlib, gen_rpc]}
]}

Or your mix.exs file:

def application do
  applications: [:gen_rpc]
end

Finally, start a couple of nodes to test it out:

(my_app@127.0.0.1)1> gen_rpc:call('other_node@1.2.3.4', erlang, node, []).
'other_node@1.2.3.4'

API

gen_rpc implements only the subset of the functions of the rpc library that make sense for the problem it's trying to solve. The library's function interface and return values is 100% compatible with rpc with only one addition: Error return values include {badrpc, Error} for RPC-based errors but also {badtcp, Error} for TCP-based errors.

For more information on what the functions below do, run erl -man rpc.

Functions exported

Per-Key Sharding

gen_rpc supports multiple outgoing connections per node using a key of arbitrary type to differentiate between connections. To leverage this feature, replace Node in your calls (single-node and multi-node alike) with {Node, Key}. The Key is hashed using erlang:phash2/1, attached to the client process name and a new connection is initiated.

Attention: When using functions that call gen_rpc:nodes/0 implicitly (such as gen_rpc:multicall/3), the channels used to communicate to the nodes are the keyless ones. To leverage the sharded functionality, pre-create your {Node, Key} lists and pass them as the node list in the multi-node function.

Module version control

gen_rpc supports executing RPC calls on remote nodes that are running only specific module versions. To leverage that feature, in place of Module in the section above, use {Module, Version}. If the remote module is not on the version requested a {badrpc,incompatible] will be returned.

Application settings

Logging

gen_rpc uses hut for logging. This allows the developer to integrate the logging library of their choice by providing the appropriate definition in their rebar.config. The default logging facility of hut is SASL.

For more information on how to enable gen_rpc to use your own logging facility, consult the README.md of hut.

SSL Configuration

gen_rpc supports SSL for inter-node communication. This allows secure communication and execution over insecure channels such as the internet, essentially allowing a trully globally distributed Erlang/Elixir setup. gen_rpc is very opinionated on how SSL should be configured and the bundled default options include:

All of these settings can be found in include/ssl.hrl and overriden by redefining the necessary option in ssl_client_options and ssl_server_options. To actually use SSL support, you'll need to define in both ssl_client_options and ssl_server_options:

To generate your own self-signed CA and node certificates, numerous articles can be found online such as this.

Usually, the CA that will be signing your client and server SSL certificates will be the same so a nominal sys.confg that includes SSL support for gen_rpc will look like:

    {gen_rpc, [
        {ssl_client_options, [
            {certfile, "priv/cert.pem"},
            {keyfile, "priv/cert.key"},
            {cacertfile, "priv/ca.pem"},
            {dhfile, "priv/dhparam.pem"}
        ]},
        {ssl_server_options, [
            {certfile, "priv/cert.pem"},
            {keyfile, "priv/cert.key"},
            {cacertfile, "priv/ca.pem"},
            {dhfile, "priv/dhparam.pem"}
        ]}
    ]}

For multi-site deployments, a performant setup can be provisioned with edge gen_rpc nodes using SSL over the internet and plain TCP for internal data exchange. In that case, non-edge nodes can have {ssl_server_port, false} and {default_client_driver, tcp} and edge nodes can have their plain TCP port firewalled externally and {default_client_driver, ssl}.

External Source Support

gen_rpc can call an external module to provide driver/port mappings in case you want to use an external discovery service like etcd for node configuration management. The module should implement the gen_rpc_external_source behaviour which takes the Node as an argument and should return either {Driver, Port} (Driver being tcp or ssl and Port being the port the remote node's gen_rpc's driver is listening in) or {error, Reason} (if the service is unavailable). To set it, change client_config_per_node from the default of {internal, #{}} to {external, ModuleName} where ModuleName is the module that implements the gen_rpc_external_source behaviour.

Build Targets

gen_rpc bundles a Makefile that makes development straightforward.

To build gen_rpc simply run:

make

To run the full test suite, run:

make test

To run the full test suite, the XRef tool and Dialyzer, run:

make dist

To build the project and drop in a console while developing, run:

make shell-master

or

make shell-slave

If you want to run a "master" and a "slave" gen_rpc nodes to run tests.

To clean every build artifact and log, run:

make distclean

Testing

A full suite of tests has been implemented for gen_rpc. You can run the CT-based test suite, dialyzer and xref by:

make dist

If you have Docker available on your system, you can run dynamic integration tests with "physically" separated hosts/nodes by running the command:

make integration

This will launch 3 slave containers and 1 master (change that by NODES=5 make integration) and will run the integration_SUITE CT test suite.

Rationale

TL;DR: gen_rpc uses a mailbox-per-node architecture and gen_tcp processes to parallelize data reception from multiple nodes without blocking the VM's distributed port.

The reasons for developing gen_rpc became apparent after a lot of trial and error while trying to scale a distributed Erlang infrastructure using the rpc library initially and subsequently erlang:spawn/4 (remote spawn). Both these solutions suffer from very specific issues under a sufficiently high number of requests.

The rpc library operates by shipping data over the wire via Distributed Erlang's ports into a registered gen_server on the other side called rex (Remote EXecution server), which is running as part of the standard distribution. In high traffic scenarios, this allows the inherent problem of running a single gen_server server to manifest: mailbox flooding. As the number of nodes participating in a data exchange with the node in question increases, so do the messages that rex has to deal with, eventually becoming too much for the process to handle (don't forget this is confined to a single thread).

Enter erlang:spawn/4 (remote spawn from now on). Remote spawn dynamically spawns processes on a remote node, skipping the single-mailbox restriction that rex has. The are various libraries written to leverage that loophole (such as Rexi), however there's a catch.

Remote spawn was not designed to ship large amounts of data as part of the call's arguments. Hence, if you want to ship a large binary such as a picture or a transaction log (large can also be small if your network is slow) over remote spawn, sooner or later you'll see this message popping up in your logs if you have subscribed to the system monitor through erlang:system_monitor/2:

{monitor,<4685.187.0>,busy_dist_port,#Port<4685.41652>}

This message essentially means that the VM's distributed port pair was busy while the VM was trying to use it for some other task like Distributed Erlang heartbeat beacons or mnesia synchronization. This of course wrecks havoc in certain timing expectations these subsystems have and the results can be very problematic: the VM might detect a node as disconnected even though everything is perfectly healthy and mnesia might misdetect a network partition.

gen_rpc solves both these problems by sharding data coming from different nodes to different processes (hence different mailboxes) and by using a different gen_tcp port for different nodes (hence not utilizing the Distributed Erlang ports).

Architecture

In order to achieve the mailbox-per-node feature, gen_rpc uses a very specific architecture:

All gen_tcp processes are properly linked so that any TCP failure will cascade and close the TCP channels and any new connection will allocate a new process and port.

An inactivity timeout has been implemented inside the client and server processes to free unused TCP connections after some time, in case that's needed.

Performance

gen_rpc is being used in production extensively with over 150.000 incoming calls/sec/node on a 8-core Intel Xeon E5 CPU and Erlang 19.1. The median payload size is 500 KB. No stability or scalability issues have been detected in over a year.

Known Issues

Licensing

This project is published and distributed under the Apache License.

Contributing

Please see CONTRIBUTING.md

Contributors: