Home

Awesome

Zarr Benchmarks

This repository contains benchmarks of Zarr V3 implementations.

[!NOTE] Contributions are welcomed for additional benchmarks, more implementations, or otherwise cleaning up this repository.

Also consider restarting development of the official zarr benchmark repository: https://github.com/zarr-developers/zarr-benchmark

Implementations Benchmarked

Benchmark scripts are in the scripts folder and implementation versions are listed in the benchmark charts.

[!WARNING] Python benchmarks are subject to the overheads of Python and may not be using an optimal API/parameters.

Please open a PR if you can improve these benchmarks.

make Targets

Benchmark Data

All datasets are $1024x2048x2048$ uint16 arrays.

NameChunk ShapeShard ShapeCompressionSize
Uncompressed$256^3$None8.0 GB
Compressed$256^3$blosclz 9 + bitshuffling377 MB
Compressed + Sharded$32^3$$256^3$blosclz 9 + bitshuffling1.1 GB

Benchmark System

Round Trip Benchmark

This benchmark measures time and peak memory usage to "round trip" a dataset (potentially chunk-by-chunk).

Table of raw measurements (benchmarks_roundtrip.md)

Standalone

roundtrip benchmark image

Dask

roundtrip benchmark image dask

Read Chunk-By-Chunk Benchmark

This benchmark measures the the minimum time and peak memory usage to read a dataset chunk-by-chunk into memory.

Table of raw measurements (benchmarks_read_chunks.md)

Standalone

read chunks benchmark image

[!NOTE] zarr-python benchmarks with sharding are not visible in this plot

Dask

read chunks benchmark image dask

Read All Benchmark

This benchmark measures the minimum time and and peak memory usage to read an entire dataset into memory.

Table of raw measurements (benchmarks_read_all.md)

Standalone

read all benchmark image

Dask

read all benchmark image dask