Awesome
Benchmarks for Scylla-RDF
Ingestion: Google Dataflow 109M triples
The ingestion pipelines from scylla-beam-pipelines were used. The ingestion included the bulk loading RDF into all indexes in ScyllaDB, the full-text index wasn't considered here.
ScyllaDB had 2 nodes with the following characteristics:
- n1-standard-16,
- 2 * Local SSD with NVMe (375 Gb each)
The pipelines were run on Google Dataflow:
- 20 machines (n1-standard-1),
- 1 * connection to ScyllaDB per machine,
- 8192 * parallel requests per a connection.
The ingestion pipelines run ~16 min and loaded 109,836,664 RDF triples which gives ~114k triples/sec.
Queries: WatDiv 109M triples
The queries are executed by the query-executor. Example command:
java -jar query-executor-1.0-SNAPSHOT-jar-with-dependencies.jar http://graph-worker-vis:3001/server/repositories/watdiv ./queries ./results
Dataset & Queries
- dataset (~109M triples) at
gs://scylla-rdf-benchmark/dataset.nt
, - queries at
gs://scylla-rdf-benchmark/queries/
.
Metrics
ttfb
- time to first bytettlb
- time to load bytes