Home

Awesome

A Guide to Global Data in Rust

This guide explains how you can use "global data" in Rust. When I say "global data," I mean data that is loaded near the start of the program and is accessible in almost all of the program.

Possible use cases for global data:

Tradeoffs

Below are questions to think about when you're choosing a global data solution for your program.

Compile time or run time?

Loading the data at compile time provides the opportunity to detect invalid data sooner. Also, it might improve the program's startup time.

Loading the data at run time can be nice because changing the data won't trigger a recompile. In complex Rust projects, long compilation times can be a pain point. Another advantage of loading at run-time is that the data can be loaded lazily, which could improve the program's startup time if there is lots of data but not all of it is needed immediately.

It's also possible to implement a hybrid approach where the data is validated at compile time but loaded at run time. That combines the eager validation of compile-time loading with the not-needing-to-recompile of run-time loading.

Mutable vs. immutable

When I say "immutable" and "mutable," I mean it in a general and hand-wavy sense that is not the equivalent of a Rust type system concept. For an example of this, lazy_static uses mutability under the hood but I'm categorizing it as "immutable" because it presents an immutable interface to the user.

Immutable global data can be safely shared between threads with minimal synchronization. It's simple, fast, and easy to understand.

Mutable global data can be a really powerful tool but sometimes can make a program hard to reason about. Before choosing mutable global data, first consider whether there's a way to refactor your code to reduce the scope of the mutable data.

Hot-reloading is an interesting kind of unidirectional immutability where the program can't change the data but external entities can.

Lifetime of data

Data with the 'static lifetime can make things easier because you can use it literally anywhere in your program. Statics are "are baked into the data segment of the final binary" (TRPL 1st ed.).

Not all global data will need the 'static lifetime. Maybe you only need your data available in most of your program, not all of it. This can open up more options for loading your data at run time.

Is heap allocation supported?

Heap allocation is convenient because you don't need to know the size of your data at compile time. However, it means that you can't use this method without an allocator. Avoiding heap allocations is most important in embedded programming, real-time systems, and really high-performance applications.

Potential Solutions

Here I'll explain a bit about how each solution works and how to use them, as well as the advantages and disadvantages of each. I will try to order the solutions in order of increasing power, inspired by the Principle of Least Power, although it won't be a strict ordering because there are qualitative differences.

The let keyword

The let keyword, which you're probably already familiar with, is used to declare all variables in Rust. Although it might not be the most obvious choice for global data, it offers a number of advantages.

struct Config {
    my_name: String
}

fn my_fn(config: Config) {
    assert_eq!(config.my_name, "paul");
}

fn main() {
    // This does heap allocation
    let config = Config { my_name: String::from("paul") };
    my_fn(config);
}

Advantages:

Disadvantages:

The const keyword

The const keyword (TRPL Chapter 3) is Rust's built-in way to handle immutable constant data. An extremely simple approach.

const MY_NAME: &str = "paul";

fn main() {
    assert_eq!(MY_NAME, "paul");
}

Advantages:

Disadvantages:

std::include_str! and std::include_bytes!

The std::include_str! and std::include_bytes! macros include a file as &'static str and &'static [u8], respectively.

See src/lib.rs for the code that uses these macros.

fn main() {
    assert_eq!("Hello, World!", global_data_in_rust::SAMPLE_STR);
    assert_eq!(b"Hello, World!", global_data_in_rust::SAMPLE_BYTES);
}

Advantages:

The lazy_static and once_cell crates

The lazy_static and once_cell crates both provide safe interfaces for exactly-once initialization of global static data. They are similar enough that I've grouped them together for now. lazy_static is more focused on convenient features for end users, whereas once_cell provides more low-level flexibility and avoids macros.

Advantages:

Disadvantages:

Here's lazy_static:

#[macro_use]
extern crate lazy_static;

use std::collections::HashMap;

lazy_static! {
    static ref GLOBAL_MAP: HashMap<&'static str, &'static str> = {
        let mut m = HashMap::new();
        m.insert("key", "value");
        m
    };
}

fn main() {
    assert_eq!(GLOBAL_MAP.get(&"key"), Some(&"value"));
}

...and here's once_cell:

use std::{sync::Mutex, collections::HashMap};
use once_cell::sync::Lazy;

static GLOBAL_MAP: Lazy<Mutex<HashMap<&'static str, &'static str>>> = Lazy::new(|| {
    let mut m = HashMap::new();
    m.insert("key", "value");
    Mutex::new(m)
});

fn main() {
    assert_eq!(GLOBAL_MAP.lock().unwrap().get("key"), Some(&"value"));
}

Immutable static items

If you want flexible mutable global data, one way to accomplish it is to put a synchronization primitive into an immutable static item. There are two broad ways to do this. One is by lazily initializing the synchronization primitive at run time using something like lazy_static and once_cell. The other way to do it is to use a synchronization primitive that can be initialized statically, such as those provided by the parking_lot crate.

Advantages:

Disadvantages:

The example below isn't parallel, but the parking_lot mutex can be used in parallel.

static MY_DATA: parking_lot::Mutex<&str> = parking_lot::const_mutex("hello");

pub fn main() {
    *MY_DATA.lock() = "world";
    assert_eq!(*MY_DATA.lock(), "world");
}

The phf crate

The phf crate lets you generate maps at compile time.

Advantages:

Disadvantages:

There are two ways to use phf. Probably the most normal way is with a custom build script, which would let you generate the map from, e.g., an ingested data file. See src/main.rs for an example of this (I couldn't get it to work with skeptic). The other, simpler way is to create the map inline with a macro:

use phf::phf_map;

#[derive(Clone, Debug, PartialEq)]
pub enum Keyword {
    Loop,
    Continue,
    Break,
    Fn,
    Extern,
}

static KEYWORDS: phf::Map<&'static str, Keyword> = phf_map! {
    "loop" => Keyword::Loop,
    "continue" => Keyword::Continue,
    "break" => Keyword::Break,
    "fn" => Keyword::Fn,
    "extern" => Keyword::Extern,
};

fn main() {
    assert_eq!(KEYWORDS.get("loop"), Some(&crate::Keyword::Loop))
}

The arc-swap crate

When choosing a solution for hot-reloadable global configuration, it's challenging to allow writes without blocking reads. The arc-swap crate provides a thoughtful solution to this problem by taking advantage of atomics. The crate is optimized for managing data that is read frequently but written only occasionally.

Here's an example of how arc-swap could be used with lazy_static to implement reloadable global configuration. Although the example is not concurrent, the crate will be most helpful in concurrent programming.

#[macro_use]
extern crate lazy_static;
use arc_swap::{ArcSwap};
use std::sync::Arc;

lazy_static! {
    static ref GLOBAL_CONFIG: ArcSwap<&'static str> = {
        ArcSwap::from(Arc::new("hello"))
    };
}

fn main() {
    assert_eq!(**GLOBAL_CONFIG.load(), "hello");
    GLOBAL_CONFIG.swap(Arc::new("world"));
    assert_eq!(**GLOBAL_CONFIG.load(), "world");
}

std::include!

The std::include macro is kind of like copy-pasting a snippet of Rust into your code. It can be used to generate complex Rust code at compile time (as in phf).

Advantages:

Disadvantages

See src/lib.rs for the example include code.

fn main() {
    assert_eq!(6, global_data_in_rust::ALSO_SIX);
}

Mutable static items

Items declared as static mut are extremely powerful; all access to them is unsafe and they provide no guard rails whatsoever. In fact, they are so "powerful" that they are being considered for deprecation as of May 2020. They are likely not the solution you're looking for; you can almost always replace them with an immutable static using some kind of synchronization primitive. If you think there's no other way to solve your problem, you're basically talking about building your own synchronization primitive, so be sure to read the Rustonomicon and thoroughly understand the implications of unsafe behavior beforehand!

Domain-specific solutions

TODO