Awesome
RadixSpline: A Single-Pass Learned Index
A read-only learned index structure that can be built in a single pass over sorted data. Can be used as a drop-in replacement for std::multimap
. Currently limited to uint32_t
and uint64_t
data types.
Build
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
./example
./tester
Examples
Using rs::Builder
to index sorted data in one pass, without copying the data:
// Create random keys.
vector<uint64_t> keys(1e6);
generate(keys.begin(), keys.end(), rand);
keys.push_back(8128);
sort(keys.begin(), keys.end());
// Build RadixSpline.
uint64_t min = keys.front();
uint64_t max = keys.back();
rs::Builder<uint64_t> rsb(min, max);
for (const auto& key : keys) rsb.AddKey(key);
rs::RadixSpline<uint64_t> rs = rsb.Finalize();
// Search using RadixSpline.
rs::SearchBound bound = rs.GetSearchBound(8128);
cout << "The search key is in the range: ["
<< bound.begin << ", " << bound.end << ")" << endl;
auto start = begin(keys) + bound.begin, last = begin(keys) + bound.end;
cout << "The key is at position: " << std::lower_bound(start, last, 8128) - begin(keys) << endl;
Using rs::MultiMap
to index unsorted data, which internally creates a sorted copy:
vector<pair<uint64_t, char>> data = {{1ull, 'a'},
{12ull, 'b'},
{7ull, 'c'}, // Unsorted.
{42ull, 'd'}};
rs::MultiMap<uint64_t, char> map(begin(data), end(data));
cout << "find(7): '" << map.find(7)->second << "'" << endl;
cout << "lower_bound(3): '" << map.lower_bound(3)->second << "'" << endl;
Cite
Please cite our aiDM@SIGMOD 2020 paper if you use this code in your own work:
@inproceedings{radixspline,
author = {Andreas Kipf and
Ryan Marcus and
Alexander van Renen and
Mihail Stoian and
Alfons Kemper and
Tim Kraska and
Thomas Neumann},
title = {{RadixSpline}: a single-pass learned index},
booktitle = {Proceedings of the Third International Workshop on Exploiting Artificial
Intelligence Techniques for Data Management, aiDM@SIGMOD 2020, Portland,
Oregon, USA, June 19, 2020},
pages = {5:1--5:5},
year = {2020},
url = {https://doi.org/10.1145/3401071.3401659},
doi = {10.1145/3401071.3401659},
timestamp = {Mon, 08 Jun 2020 19:13:59 +0200},
biburl = {https://dblp.org/rec/conf/sigmod/KipfMRSKK020.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}