Awesome
A Guide to Global Data in Rust
This guide explains how you can use "global data" in Rust. When I say "global data," I mean data that is loaded near the start of the program and is accessible in almost all of the program.
Possible use cases for global data:
- App configuration, e.g. weapon characteristics for a game
- Making data available everywhere without needing to pass it as an argument through all functions (apply this carefully!)
- Generating Rust code from external data
- Database connections... or other network resources?
- A logger, maybe
Tradeoffs
Below are questions to think about when you're choosing a global data solution for your program.
Compile time or run time?
Loading the data at compile time provides the opportunity to detect invalid data sooner. Also, it might improve the program's startup time.
Loading the data at run time can be nice because changing the data won't trigger a recompile. In complex Rust projects, long compilation times can be a pain point. Another advantage of loading at run-time is that the data can be loaded lazily, which could improve the program's startup time if there is lots of data but not all of it is needed immediately.
It's also possible to implement a hybrid approach where the data is validated at compile time but loaded at run time. That combines the eager validation of compile-time loading with the not-needing-to-recompile of run-time loading.
Mutable vs. immutable
When I say "immutable" and "mutable," I mean it in a general and hand-wavy sense that is not the equivalent of a Rust type system concept. For an example of this, lazy_static
uses mutability under the hood but I'm categorizing it as "immutable" because it presents an immutable interface to the user.
Immutable global data can be safely shared between threads with minimal synchronization. It's simple, fast, and easy to understand.
Mutable global data can be a really powerful tool but sometimes can make a program hard to reason about. Before choosing mutable global data, first consider whether there's a way to refactor your code to reduce the scope of the mutable data.
Hot-reloading is an interesting kind of unidirectional immutability where the program can't change the data but external entities can.
Lifetime of data
Data with the 'static
lifetime can make things easier because you can use it literally anywhere in your program. Statics are "are baked into the data segment of the final binary" (TRPL 1st ed.).
Not all global data will need the 'static
lifetime. Maybe you only need your data available in most of your program, not all of it. This can open up more options for loading your data at run time.
Is heap allocation supported?
Heap allocation is convenient because you don't need to know the size of your data at compile time. However, it means that you can't use this method without an allocator. Avoiding heap allocations is most important in embedded programming, real-time systems, and really high-performance applications.
Potential Solutions
Here I'll explain a bit about how each solution works and how to use them, as well as the advantages and disadvantages of each. I will try to order the solutions in order of increasing power, inspired by the Principle of Least Power, although it won't be a strict ordering because there are qualitative differences.
The let
keyword
The let
keyword, which you're probably already familiar with, is used to declare all variables in Rust. Although it might not be the most obvious choice for global data, it offers a number of advantages.
struct Config {
my_name: String
}
fn my_fn(config: Config) {
assert_eq!(config.my_name, "paul");
}
fn main() {
// This does heap allocation
let config = Config { my_name: String::from("paul") };
my_fn(config);
}
Advantages:
- Built into Rust
- You often don't need to specify the type of the data. This can be useful for functions and complex types.
- It's easier to provide dummy data for testing
- Allows mutable data
- Allows heap-allocated data
Disadvantages:
- You need to pass the config through each function that you use, which may be bothersome
The const
keyword
The const
keyword (TRPL Chapter 3) is Rust's built-in way to handle immutable constant data. An extremely simple approach.
const MY_NAME: &str = "paul";
fn main() {
assert_eq!(MY_NAME, "paul");
}
Advantages:
- Built into Rust
'static
lifetime- Data type is validated at compile time
Disadvantages:
- The data that can be created is restricted to simple operations like creating a new struct, as well as some
std
functions that have the#[rustc_const_stable]
annotation.
std::include_str!
and std::include_bytes!
The std::include_str!
and std::include_bytes!
macros include a file as &'static str
and &'static [u8]
, respectively.
See src/lib.rs
for the code that uses these macros.
fn main() {
assert_eq!("Hello, World!", global_data_in_rust::SAMPLE_STR);
assert_eq!(b"Hello, World!", global_data_in_rust::SAMPLE_BYTES);
}
Advantages:
- Built into Rust
- Lifetime of data is
'static
- Checks for the presence of the file at compile time
The lazy_static
and once_cell
crates
The lazy_static
and once_cell
crates both provide safe interfaces for exactly-once initialization of global static data. They are similar enough that I've grouped them together for now. lazy_static
is more focused on convenient features for end users, whereas once_cell
provides more low-level flexibility and avoids macros.
Advantages:
'static
lifetime- Data is loaded lazily at run time
- Allows heap-allocated data
- Allows interior-mutable data
- Can work w/o
std
usingspin_no_std
Disadvantages:
- The data type needs to fulfill the
Sync
trait. So, if you want have mutable data, you probably need to use like aMutex
orRwLock
. Beware deadlocks and confusing code? - If the type has a destructor, then it will not run when the process exits. So you probably wouldn't want to do this with anything that has complicated resources that need to be cleaned up. Maybe temporary files, lock files or PID files?
Here's lazy_static
:
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref GLOBAL_MAP: HashMap<&'static str, &'static str> = {
let mut m = HashMap::new();
m.insert("key", "value");
m
};
}
fn main() {
assert_eq!(GLOBAL_MAP.get(&"key"), Some(&"value"));
}
...and here's once_cell
:
use std::{sync::Mutex, collections::HashMap};
use once_cell::sync::Lazy;
static GLOBAL_MAP: Lazy<Mutex<HashMap<&'static str, &'static str>>> = Lazy::new(|| {
let mut m = HashMap::new();
m.insert("key", "value");
Mutex::new(m)
});
fn main() {
assert_eq!(GLOBAL_MAP.lock().unwrap().get("key"), Some(&"value"));
}
Immutable static items
If you want flexible mutable global data, one way to accomplish it is to put a synchronization primitive into an immutable static item. There are two broad ways to do this. One is by lazily initializing the synchronization primitive at run time using something like lazy_static
and once_cell
. The other way to do it is to use a synchronization primitive that can be initialized statically, such as those provided by the parking_lot
crate.
Advantages:
'static
lifetime- Choose your own synchronization primitive
- Choose between compile time and run time initialization
- Enables interior mutability
Disadvantages:
- More choices to make
Drop
doesn't run on static items
The example below isn't parallel, but the parking_lot
mutex can be used in parallel.
static MY_DATA: parking_lot::Mutex<&str> = parking_lot::const_mutex("hello");
pub fn main() {
*MY_DATA.lock() = "world";
assert_eq!(*MY_DATA.lock(), "world");
}
The phf
crate
The phf crate lets you generate maps at compile time.
Advantages:
- Compile-time of data validity
'static
lifetime- I think that no heap allocation is required (data lives in binary)
Disadvantages:
- Kind of complex to get working
- Only supports maps
There are two ways to use phf
. Probably the most normal way is with a custom build script, which would let you generate the map from, e.g., an ingested data file. See src/main.rs
for an example of this (I couldn't get it to work with skeptic
).
The other, simpler way is to create the map inline with a macro:
use phf::phf_map;
#[derive(Clone, Debug, PartialEq)]
pub enum Keyword {
Loop,
Continue,
Break,
Fn,
Extern,
}
static KEYWORDS: phf::Map<&'static str, Keyword> = phf_map! {
"loop" => Keyword::Loop,
"continue" => Keyword::Continue,
"break" => Keyword::Break,
"fn" => Keyword::Fn,
"extern" => Keyword::Extern,
};
fn main() {
assert_eq!(KEYWORDS.get("loop"), Some(&crate::Keyword::Loop))
}
The arc-swap
crate
When choosing a solution for hot-reloadable global configuration, it's challenging to allow writes without blocking reads. The arc-swap
crate provides a thoughtful solution to this problem by taking advantage of atomics. The crate is optimized for managing data that is read frequently but written only occasionally.
Here's an example of how arc-swap
could be used with lazy_static
to implement reloadable global configuration. Although the example is not concurrent, the crate will be most helpful in concurrent programming.
#[macro_use]
extern crate lazy_static;
use arc_swap::{ArcSwap};
use std::sync::Arc;
lazy_static! {
static ref GLOBAL_CONFIG: ArcSwap<&'static str> = {
ArcSwap::from(Arc::new("hello"))
};
}
fn main() {
assert_eq!(**GLOBAL_CONFIG.load(), "hello");
GLOBAL_CONFIG.swap(Arc::new("world"));
assert_eq!(**GLOBAL_CONFIG.load(), "world");
}
std::include!
The std::include
macro is kind of like copy-pasting a snippet of Rust into your code. It can be used to generate complex Rust code at compile time (as in phf
).
Advantages:
- Built into Rust
- More powerful code generation than with a macro
- Errors will be detected at compile time
- Create mutable or immutable data
- Can work with
'static
lifetime
Disadvantages
- Unhygienic (in the macro sense)
See src/lib.rs
for the example include
code.
fn main() {
assert_eq!(6, global_data_in_rust::ALSO_SIX);
}
Mutable static items
Items declared as static mut
are extremely powerful; all access to them is unsafe
and they provide no guard rails whatsoever. In fact, they are so "powerful" that they are being considered for deprecation as of May 2020. They are likely not the solution you're looking for; you can almost always replace them with an immutable static using some kind of synchronization primitive. If you think there's no other way to solve your problem, you're basically talking about building your own synchronization primitive, so be sure to read the Rustonomicon and thoroughly understand the implications of unsafe behavior beforehand!
Domain-specific solutions
- The Embedded Rust Book suggests using a singleton pattern instead of a
static mut
to "treat your hardware like data" without requiring as muchunsafe
. - The Amethyst game engine has a
Loader
struct that can be used to load data.