Home

Awesome

Vegur

Build Status

Heroku's proxy library based on a forked Cowboy frontend (Cowboyku). This library handles proxying in Heroku's routing stack

Illfær vegur

And how do you pronounce vegur? Like this.

Build

$ rebar3 compile

Test

$ rebar3 ct

Writing a Router

Vegur is a proxy application, meaning that it takes care of receiving HTTP requests and forwarding them to another server; similarly for responses.

What it isn't is a router, meaning that it will not handle choosing which nodes to send traffic to, nor will it actually track what backends are available. This task is left to the user of the library, by writing a router callback module.

src/vegur_stub.erl, which provides an example implementation of the callback module that has to be used to implement routing logic, can be used as a source of information.

Demo reverse-proxy

To set up a reverse-proxy that does load balancing locally, we'll first set up two toy servers:

$ while true; do ( BODY=$(date); echo -e "HTTP/1.1 200 OK\r\nConnection: close\r\nContent-Length: ${#BODY}\r\n\r\n$BODY" | nc -l -p 8081 ); done
$ while true; do ( BODY=$(date); echo -e "HTTP/1.1 200 OK\r\nConnection: close\r\nContent-Length: ${#BODY}\r\n\r\n$BODY" | nc -l -p 8082 ); done

These have the same behaviour and will do the exact same thing, except one is on port 8081 and the other is on port 8082. You can try reaching them from your browser.

To make things simple, I'm going to hardcode both back-ends directly in the source module:

-module(toy_router).
-behaviour(vegur_interface).
-export([init/2,
         terminate/3,
         lookup_domain_name/3,
         checkout_service/3,
         checkin_service/6,
         service_backend/3,
         feature/2,
         additional_headers/4,
         error_page/4]).

-record(state, {tries = [] :: list()}).

This is our list of exported functions, along with the behaviour they implement (vegur_interface), and a record defining the internal state of each router invocation. We track a single value, tries, which will be useful to make sure we don't end up in an infinite loop if we ever have no backends alive.

An important thing to note is that this toy_router module will be called once per request and is decentralized with nothing shared, unlike a node-unique gen_server.

Now for the implementation of specific callbacks, documented in src/vegur_stub.erl:

init(_AcceptTime, Upstream) ->
    {ok, Upstream, #state{}}. % state initialization here.

lookup_domain_name(_ReqDomain, Upstream, State) ->
    %% hardcoded values, we don't care about the domain
    Servers = [{1, {127,0,0,1}, 8081},
               {2, {127,0,0,1}, 8082}],
    {ok, Servers, Upstream, State}.

From there on, we then can fill in the checkin/checkout logic. We technically have a limitation of one request at a time per server, but we won't track these limitations outside of a limited number of connection retries.

checkout_service(Servers, Upstream, State=#state{tries=Tried}) ->
    Available = Servers -- Tried,
    case Available of
        [] ->
            {error, all_blocked, Upstream, State};
        _ ->
            N = rand:uniform(length(Available)),
            Pick = lists:nth(N, Available),
            {service, Pick, Upstream, State#state{tries=[Pick | Tried]}}
    end.

service_backend({_Id, IP, Port}, Upstream, State) ->
    %% Extract the IP:PORT from the chosen server.
    %% To enable keep-alive, use:
    %% `{{keepalive, {default, {IP,Port}}}, Upstream, State}'
    %% To force the use of a new keepalive connection, use:
    %% `{{keepalive, {new, {IP,Port}}}, Upstream, State}'
    %% Otherwise, no keepalive is done to the back-end:
    {{IP, Port}, Upstream, State}.

checkin_service(_Servers, _Pick, _Phase, _ServState, Upstream, State) ->
    %% if we tracked total connections, we would decrement the counters here
    {ok, Upstream, State}.

We're also going to enable none of the features and add no headers in either direction because this is a basic demo:

feature(_WhoCares, State) ->
    {disabled, State}.

additional_headers(_Direction, _Log, _Upstream, State) ->
    {[], State}.

And error pages. For now we only care about the one we return, which is all_blocked:

error_page(all_blocked, _DomainGroup, Upstream, State) ->
    {{502, [], <<>>}, Upstream, State}; % Bad Gateway

And then the default ones, which I define broadly:

%% Vegur-returned errors that should be handled no matter what.
%% Full list in src/vegur_stub.erl
error_page({upstream, _Reason}, _DomainGroup, Upstream, HandlerState) ->
    %% Blame the caller
    {{400, [], <<>>}, Upstream, HandlerState};
error_page({downstream, _Reason}, _DomainGroup, Upstream, HandlerState) ->
    %% Blame the server
    {{500, [], <<>>}, Upstream, HandlerState};
error_page({undefined, _Reason}, _DomainGroup, Upstream, HandlerState) ->
    %% Who knows who was to blame!
    {{500, [], <<>>}, Upstream, HandlerState};
%% Specific error codes from middleware
error_page(empty_host, _DomainGroup, Upstream, HandlerState) ->
    {{400, [], <<>>}, Upstream, HandlerState};
error_page(bad_request, _DomainGroup, Upstream, HandlerState) ->
    {{400, [], <<>>}, Upstream, HandlerState};
error_page(expectation_failed, _DomainGroup, Upstream, HandlerState) ->
    {{417, [], <<>>}, Upstream, HandlerState};
%% Catch-all
error_page(_, _DomainGroup, Upstream, HandlerState) ->
    {{500, [], <<>>}, Upstream, HandlerState}.

And then terminate without doing anything special (we don't have state to tear down, for example):

terminate(_, _, _) ->
    ok.

And then we're done. Compile all that stuff:

$ rebar3 shell
Erlang/OTP 17 [erts-6.0] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V6.0  (abort with ^G)
1> c("demo/toy_router"), application:ensure_all_started(vegur), vegur:start_http(8080, toy_router, [{middlewares, vegur:default_middlewares()}]).
{ok,<0.62.0>}

You can then call localhost:8080 and see the request routed to either of your netcat servers.

Congratulations, you have a working reverse-load balancer and/or proxy/router combo running. You can shut down either server. The other should take the load, and if it also fails, the user would get an error since nothing is left available.

Behaviour

There are multiple specific HTTP behaviours that have been chosen/implemented in this proxying software. The list is maintained at https://devcenter.heroku.com/articles/http-routing

Configuration

OTP Configuration

The configuration can be passed following the standard Erlang/OTP application logic.

Server Configuration

The HTTP servers themselves can also have their own configuration in a per-listener manner. The following options are valid when passed to vegur:start/5:

It is recommended that options regarding header sizes for the HTTP listener match the options for the max_cookie_length in the OTP options to avoid the painful case of a backend setting a cookie that cannot be sent back by the end client.

Middlewares

Vegur supports a middleware interface that can be configured when booting the application. These can be configured by setting the middlewares option:

vegur:start_http(Port, CallbackMod, [{middlewares, Middlewares}]),
vegur:start_proxy(Port, CallbackMod, [{middlewares, Middlewares}]),

The middlewares value should always contain, at the very least, the result of vegur:default_middlewares(), which implements some required functionality.

For example, the following middlewares are the default ones:

The order is important, and as defined, default middlewares must be kept for a lot of functionality (from safety to actual proxying) to actually work.

Custom middlewares can still be added throughout the chain by adding them to the list.

Defining middlewares

The middlewares included are standard cowboyku (cowboy ~0.9) middlewares and respect the same interface.

There's a single callback defined:

execute(Req, Env)
    -> {ok, Req, Env}
     | {suspend, module(), atom(), [any()]}
     | {halt, Req}
     | {error, cowboyku:http_status(), Req}
    when Req::cowboyku_req:req(), Env::env().

For example, a middleware implementing some custom form of authentication where a secret token is required to access data could be devised to work like:

module(validate_custom_auth).
-behaviour(cowboyku_middleware).
-export([execute/2]).

-define(TOKEN, <<"abcdef">>. % this is really unsafe

execute(Req, Env) ->
    case cowboyku_req:header(<<"my-token">>, Req) of
        {?TOKEN, Req2} ->
            {ok, Req2, Env};
        {_, Req2} ->
            {HTTPCode, Req3} = vegur_utils:handle_error(bad_token, Req2),
            {error, HTTPCode, Req3}
    end.

Calling vegur_utils:handle_error(Reason, Req) will redirect the error to the Callback:error_page/4 callback, letting the custom callback module set its own HTTP status, handle logging, and do whatever processing it needs before stopping the request.

Logs and statistics being collected

Behaviour

Added Headers

All headers are considered to be case-insensitive, as per the HTTP Specification, but will be camel-cased by default. A few of them are added by Vegur.

Protocol Details

The vegur proxy only supports HTTP/1.0 and HTTP/1.1 clients. HTTP/0.9 and earlier are no longer supported. SPDY and HTTP/2.0 are not supported at this point.

The proxy's behavior is to be as compliant as possible with the HTTP/1.1 specifications. Special exceptions must be made for HTTP/1.0 however:

Other details:

Specifically for responses:

Additionally, while HTTP/1.1 requests and responses are expected to be keep-alive by default, if the initial request had an explicit connection: close header from the router to the backend, the backend can send a response delimited by the connection termination, without a specific content-encoding nor an explicit content-length.

Even though the HEAD HTTP verb does not require a response body to be sent over the line and ends at the response headers, HEAD requests are explicitly made to work with 101 Switching Protocols responses. A backend that doesn't want to upgrade should send a different status code, and the connection will not be upgraded.

Not Supported

Contributing

All contributed work must have:

A good commit message should include a rationale for the change, along with the existing, expected, and new behaviour.

All contributed work will be reviewed before being merged (or rejected).

This proxy is used in production with existing apps, and a commitment to backwards compatibility (or just working in the real world) is in place.

Architecture Guidelines

Most of the request validation is done through the usage of middlewares. The middlewares we use are implemented through midjan, which wraps some operations traditionally done by cowboyku in order to have more control over vital parts of a request/response whenever the RFC is different between servers and proxies.

All middleware modules have their name terminated by _middleware.

The proxy is then split into 5 major parts maintained in this directory:

  1. vegur_proxy_middleware, which handles the high-level request/response patterns.
  2. vegur_proxy, which handles the low-level HTTP coordination between requests and responses, and technicalities of socket management, header reconciliation, etc.
  3. vegur_client, a small HTTP client to call back-ends
  4. Supporting sub-states of HTTP, such as the chunked parser and the bytepipe (used for upgrades), each having its own module (vegur_chunked and vegur_bytepipe)
  5. Supporting modules, such as functional logging modules, midjan translators, and so on (vegur_req_log, vegur_midjan_translator).

Reference Material

Changelog