Home

Awesome

You can find a talk that describes this library here:

gcpp: Deferred and unordered destruction

Herb Sutter -- Updated 2016-10-16

Motivation, goals, and disclaimers

gcpp is a personal project to try an experiment: Can we take the deferred and unordered destruction patterns with custom reachability tracing logic that we find ourselves writing by hand today, and automate some parts of that work as a reusable C++ library that delivers it as a zero-overhead abstraction?

This is a demo of a potential additional fallback option for the rare cases where unique_ptr and shared_ptr aren't quite enough, notably when you have objects that refer to each other in local owning cycles, or when you need to defer destructor execution to meet real-time deadlines or to bound destructor stack cost. The goal is to illustrate ideas that others can draw from, that you may find useful even if you never use types like the ones below but just continue to use existing smart pointers and write your destructor-deferral and tracing code by hand.

Disclaimers: This is a demo, not a production quality library; bug reports are welcome. As of this writing, it works on Clang/libc++ 3.9 or later and Visual Studio 2015 Update 3 or later; if you have success with others, please report it by opening an Issue to update this README. See also the FAQ "So deferred_heap and deferred_ptr have no disadvantages?". And please see the Acknowledgments.

Overview

deferred_heap

A deferred_heap owns a region of memory containing objects that can be safely accessed via deferred_ptr<T> and that can point to each other arbitrarily within the same heap. It provides three main operations:

Because .collect() and the destructor are explicit, the program can choose when (at a convenient time) and where (e.g., on what thread or processor) to run destructors.

Local small heaps are encouraged. This keeps tracing isolated and composable; combining libraries that each use deferred_heaps internally will not directly affect each other's performance.

deferred_ptr<T>

A deferred_ptr<T> refers to a T object in a deferred_heap. You can construct a deferred_ptr from an existing one (including one returned by deferred_heap::make, the source of all non-null deferred_ptrs) or by default-constructing it to null.

It has the same functions as any normal C++ smart pointer, including derived-to-base and const conversions. Here are the main additional features and semantics:

deferred_allocator

Finally, deferred_allocator is a C++11 STL allocator that you can use to have an STL container store its elements in a deferred_heap. This is especially useful when a deferred-lifetime object contains a container of deferred_ptrs to other objects in its heap; using deferred_allocator expresses that the container's deferred_ptrs are inside the heap so that any cycles they participate in can be automatically cleaned up, whereas not using deferred_allocator expresses that the container's deferred_ptrs are outside the heap (i.e., roots).

Convenience aliases are provided; for example, deferred_vector<T> is an alias for std::vector<T, gcpp::deferred_allocator<T>>.

See also additional uses of deferred_allocator.

Example

Here is a Graph type that has its own local heap shared by all Graph objects:

// possibly-cyclic N-ary Graph, one heap for all graph nodes

class Graph {
    struct Node {
        deferred_vector<deferred_ptr<Node>> outbound_edges{ my_heap };  // keeps all edges in the heap
        /*... data ...*/
    };

    static deferred_heap my_heap;
    vector<deferred_ptr<Node>> roots;

public:
    void SampleFunction() {
        roots.emplace_back(my_heap.make<Node>());
    }

    // ... etc.
};

Here is a variation where each Graph object has its own private local heap:

// possibly-cyclic N-ary Graph, one heap per graph object

class Graph {
    struct Node {
        /*... data ...*/
        deferred_vector<deferred_ptr<Node>> outbound_edges;  // keeps all edges in the heap
        Node(deferred_heap& h) : outbound_edges{h} { }
    };

    deferred_heap my_heap;
    vector<deferred_ptr<Node>> roots;

public:
    void SampleFunction() {
        roots.emplace_back(my_heap.make<Node>(my_heap));
    }

    // ... etc.
};

Note that deferred_allocator requires C++11 allocator support for fancy pointers; see deferred_allocator implementation notes.

Target use cases

gcpp aims to address three main issues, and a speculative use case.

1. Ownership cycles, preventing leaks

The first target use case is automatic by-construction memory management for data structures with cycles, which cannot be expressed with fully automated lifetime using C++17 facilities only.

2. Real time systems, bounding the cost of pointer assignment

Using shared_ptr can be problematic in real time systems code, because any simple shared_ptr pointer assignment or destruction could cause an arbitrary number of objects to be destroyed, and therefore have unbounded cost. This is a rare case of where prompt destruction, usually a wonderful property, is actually bad for performance. (Note: This is less of a problem with unique_ptr because unique_ptr assignment necessarily causes destruction and so tends to be treated more carefully, whereas shared_ptr assignment might cause destruction.)

Today, deadline-driven code must either avoid manipulating shared_ptrs or take defensive measures:

By design, deferred_ptr assignment cost is bounded and unrelated to destructors, and deferred_heap gives full control over when and where collect() runs deferred destructors. This makes it a candidate for being appropriate for real-time code in situations where using shared_ptr may be problematic.

3. Constrained systems, bounding stack depth of destruction

Using unique_ptr and shared_ptr can be problematic in systems with constrained stack space and deep ownership. Because destructors are always run nested (recursively), the thread that releases an object must have sufficient stack space for the call depth required to destroy the tree of objects being released. If the tree can be arbitrarily deep, an arbitrary amout of stack space may be needed.

Today, systems with constrained stacks use similar techniques to those mentioned in #2 above, with similar limitations and tradeoffs.

By design, deferred_heap runs deferred destructors iteratively, not recursively. Of course, an individual deferred object may still own a tree of resources that may use shared_ptrs and be freed recursively by default, but any two deferred objects are destroyed iteratively even if they referred to each other, and their destructors never nest. This makes it a candidate for being appropriate for real-time code in situations where using shared_ptr may be problematic.

(speculative) STL iterator safety

Besides being useful to have containers of deferred_ptrs whose pointers live in a deferred_heap, using just a container<T, deferred_allocator<T>> by itself without explicit deferred_ptrs may have some interesting properties that could be useful for safer use of STL in domains where allowing iterator dereference errors to have undefined behavior is not tolerable.

Object lifetime guidance

The following summarizes the best practices we should already teach for expressing object lifetimes in C++17, and at the end adds a potential new fallback option to consider something along these lines.

Guidance / libraryWhat it automatesEfficiencyClarity and correctness
[C++17] 1. Where possible, prefer scoped lifetime by default (e.g., locals, members)Expressing that this object's lifetime is tied to some other lifetime that is already well defined, such as a block scope (auto storage duration) or another object (member lifetime)Zero additional lifetime management overhead for this object
[C++17] 2. Else prefer make_unique and unique_ptr, if the object must have its own lifetime (i.e., heap) and ownership can be unique without ownership cyclesSingle-ownership heap object lifetimeUsually identical cost as correctly written new+deleteClearer and more robust than explicit delete (declarative, uses are correct by construction)
[C++17] 3. Else prefer make_shared and shared_ptr, if the object must have its own lifetime (i.e., heap) and shared ownership, without ownership cyclesAcyclic shared heap object lifetime managed with reference countingUsually identical cost as correctly written manual reference countingClearer and more robust than manual/custom reference count logic (declarative, uses are correct by construction)
[experimental] 4. Else use similar techniques as deferred_heap and deferred_ptr, if the object must have its own lifetime (i.e., heap) and there can be ownership cyclesPotentially-cyclic shared heap object lifetime managed with liveness tracing<br><br>Real-time code with bounded pointer assignment cost requirements<br><br>Constrained stacks with bounded call depth requirements(conjecture) Usually identical cost as correctly written manual tracing(conjecture) Clearer and more robust than manual/custom tracing logic (declarative, uses are correct by construction)

FAQs

Q: "Is this garbage collection?"

Of course, and remember that so is reference counting (e.g., shared_ptr).

Q: "I meant, is this just tracing garbage collection (tracing GC)?"

Not the tracing GC most people are familiar with.

Most importantly, deferred_heap collects objects, not garbage:

The other important difference is that deferred_heap meets C++'s zero-overhead principle by being opt-in and granular, not default and global:

Q: "Is this related/similar to the Boehm collector?"

No. That is a fine collector, but with different aims:

deferred_heap runs destructors, and the tracing is accurate (not conservative) and scoped to an individual granular heap.

Q: "Is deferred_heap equivalent to region-based memory management?"

It's a strict superset of regions.

The key idea of a region is to efficiently release the region's memory in one shot when the region is destroyed, with deallocate-at-once semantics. deferred_heap does that, but goes further in three ways:

The only work performed in the deferred_heap destructor is to run pending destructors, null out any deferred_ptrs that outlive the heap, and release the heap's memory pages. So if you use a deferred_heap and never call .collect(), never allocate a non-trivially destructible object, and never let a deferred_ptr outlive the heap, then destroying the heap does exactly nothing beyond efficiently dropping any memory it owns with deallocate-at-once semantics -- and then, yes, it's a region. But, unlike a region, you can do any of those things, and if you do then they are safe.

One way to view deferred_heap is as a candidate approach for unifying tracing GC and regions. And destructors, most importantly of all.

Q: "So deferred_heap and deferred_ptr have no disadvantages?"

Of course there are disadvantages to this approach, especially in this demo implementation.

Q: "Why create another smart pointer? another allocator?"

Because that's how we roll in C++.

gcpp aims to continue C++'s long tradition of being a great language for building libraries, including for memory allocation and lifetime management.

Q: "Would gc_heap and gc_ptr be better names?"

I don't think so.

I used those names initially, but switched to "deferred" for two major reasons, and one minor reason:

  1. "Deferred" emphasizes what I think is the most important property, namely that we're talking about real objects with real destructors, just the destructors are deferred; that's even more important than the tracing collection part.

  2. "GC" could create confusion with the mainstream notion of tracing GC. This is not like GC in other major languages, as I explained above.

  3. "GC" is slightly inaccurate technically, because this isn't really adding GC to C++ inasmuch as C++11 and later already has GC because reference counting is one of the major forms of GC.

Q: "Are there other potential uses where this would be better than current smart pointers?"

Yes.

In particular, a lock-free single-width-compare-and-swap atomic_deferred_ptr<T> should be easy to write. (Of course, before trying that we'd first make deferred_heap itself thread-safe as a baseline.) It's well known that tracing GC makes it easy to avoid ABA problems and traversing-while-erasing concurrency problems for lock-free data structures, without resorting to more complex approaches like hazard pointers. The std::atomic_shared_ptr that I proposed and was adopted in C++17 solves these problems too (see Anthony Williams' nice writeup), but it has two limitations: first, like all shared_ptrs it doesn't deal with cyclic data structures; and second, the only current lock-free implementation (also by Anthony Williams; thanks again Anthony!) requires double-width compare-and-swap (DCAS), which is not available on all platform targets. An atomic_deferred_ptr should solve all four of these problems and limitations.

Even the current implementation, which uses double-width pointers, can pretty easily be made atomic without DCAS as follows: The current implementation is double width because it stores a back pointer to the deferred_heap, specifically so that we can prevent changing the heap that a deferred_ptr points into -- which means that the deferred_heap* is const once it is set to non-null. [The only exception is that the current implementation resets the heap pointer to null when the heap goes away, but that's easy to change by leaving it alone or pointing to a sentinel value.] That means that the implementation of atomic_deferred_ptr<T>::compare_exchange(that) can simply do single atomic loads of *this’s and that’s deferred_heap*s to compare them and enforce that they’re pointing into the same heap, and if they’re the same then do a normal single-width compare_exchange on the T*. -- And deferred_ptr doesn't need to be double width, there are other implementation choices that would make it just a T* and trivially copyable.

Even in the worst case of a naïve non-concurrent collector that stops the world while collecting, this should be sufficient to make using deferred_ptrs completely lock-free "in epochs" meaning in between collections, and each data structure can control when collection of its private mini-heap happens and therefore control the epoch boundaries. And I expect that a concurrent collector written by a real GC implementer (not me) could do better than the baseline "lock-free in epochs."

Implementation notes

Storing destructors

deferred_heap::make<T> stores a single object's destructor as two raw pointers:

struct destructor {
    const void* p;               // raw pointer to the object
    void(*destroy)(const void*); // raw pointer to a type-erased destructor function
};

Later we can just invoke d.destroy(d.p) to correctly clean up the complete object, even if the last deferred_ptr to it is not a deferred_ptr<T> but a pointer to a subobject (deferred_ptr<BaseClassOfT> or deferred_ptr<DataMemberOfT>).

The code conveniently gets a raw function pointer to the destructor by using a lambda:

// Called indirectly from deferred_heap::make<T>.
// Here, t is a T&.
dtors.push_back({
    std::addressof(t),                                    // address of T object
    [](const void* x) { static_cast<const T*>(x)->~T(); } // T destructor to invoke
});

One line removes the type, and the other line adds it back. The lambda gives a handy way to do type erasure -- because we know T here, we just write a lambda that performs the correct cast back to invoke the correct T destructor.

A non-capturing lambda has no state, so it can be used as a plain function. So for each distinct type T that this is instantiated with, compiling this code generates one T-specific function (on demand at compile time, globally unique) and we store that function's address. The function itself is efficient: Depending on the optimization level, the lambda is typically generated as either a one-instruction wrapper function (just a single jmp to the actual destructor) or as a copy of the destructor if the destructor is inlined (no run-time overhead at all, just another inline copy of the destructor in the binary if it's generally being inlined anyway).

deferred_allocator notes

deferred_allocator appears to work with unmodified C++11-conforming STL containers.

Acknowledgments

This personal project would be considerably weaker without input from a number of gracious experts who have been willing to share their time and experience. I would like to particularly thank the following people for their help:

Thanks, very much.