Home

Awesome

duckdb_protobuf

a duckdb extension for parsing sequences of protobuf messages encoded in either the standard varint delimited format or a u32 big endian delimited format.

quick start

ensure you're using duckdb 1.1.0 for support with the latest features. if you need new features on an old versions, please open an issue.

$ duckdb -version
v1.1.0 fa5c2fe15f

start duckdb with -unsigned flag to allow loading unsigned libraries.

$ duckdb -unsigned

or if you're using the jdbc connector, you can do this with the allow_unsigned_extensions jdbc connection property.

now install the extension:

INSTALL protobuf from 'https://duckdb.0xcaff.xyz';

next load it (you'll need to do this once for every session you want to use the extension)

LOAD protobuf;

and start shredding up your protobufs!

SELECT *
FROM protobuf(
    descriptors = './descriptor.pb',
    files = './scrape/data/SceneVersion/**/*.bin',
    message_type = 'test_server.v1.GetUserSceneVersionResponse',
    delimiter = 'BigEndianFixed'
)
LIMIT 10;

if you want builds for a platform or version which currently doesn't have builds, please open an issue.

<details> <summary>install from file</summary>

download the latest version from releases. if you're on macOS, blow away the quarantine params with the following to allow the file to be loaded

$ xattr -d com.apple.quarantine /Users/martin/Downloads/protobuf.duckdb_extension

next load the extension

LOAD '/Users/martin/Downloads/protobuf.duckdb_extension';
</details>

why

sometimes you want to land your row primary data in a format with a well-defined structure and pretty good decode performance and poke around without a load step. maybe you're scraping an endpoint which returns protobuf responses, you're figuring out the schema as you go and iteration speed matters much more than query performance.

duckdb_protobuf allows for making a new choice along the flexibility-performance tradeoff continuum for fast exploration of protobuf streams with little upfront load complexity or time.

configuration

features

limitations

i'm releasing this to understand how other folks are using protobuf streams and duckdb. i'm open to PRs, issues and other feedback.