Awesome

Benchmarks for Scylla-RDF

Ingestion: Google Dataflow 109M triples

The ingestion pipelines from scylla-beam-pipelines were used. The ingestion included the bulk loading RDF into all indexes in ScyllaDB, the full-text index wasn't considered here.

ScyllaDB had 2 nodes with the following characteristics:

n1-standard-16,
2 * Local SSD with NVMe (375 Gb each)

The pipelines were run on Google Dataflow:

20 machines (n1-standard-1),
1 * connection to ScyllaDB per machine,
8192 * parallel requests per a connection.

The ingestion pipelines run ~16 min and loaded 109,836,664 RDF triples which gives ~114k triples/sec.

Queries: WatDiv 109M triples

The queries are executed by the query-executor. Example command:

java -jar query-executor-1.0-SNAPSHOT-jar-with-dependencies.jar http://graph-worker-vis:3001/server/repositories/watdiv ./queries ./results

Dataset & Queries

dataset (~109M triples) at gs://scylla-rdf-benchmark/dataset.nt,
queries at gs://scylla-rdf-benchmark/queries/.

Metrics