Home

Awesome

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

CostFed is an index-assisted federation engine for federated SPARQL query processing over multiple SPARQL endpoints. CostFed makes use of statistical information collected from endpoints to perform efficient source selection and cost-based query planning. In contrast to the state of the art, it relies on a non-linear model for the estimation of the selectivity of joins. Therewith, it is able to generate better plans than the state-of-the-art federation engines. In an experimental evaluation based on FedBench benchmark, we show that CostFed is 3 to 121 times faster than the state of the art SPARQL endpoint federation engines.

Citation

Saleem, M., Potocki, A., Soru, T., Hartig, O. and Ngomo, A.C.N., 2018. CostFed: Cost-based query optimization for SPARQL endpoint federation. Semantics 2018, Procedia Computer Science, 137, pp.163-174.

Live Demo

The CostFed live demo comprise the following two main applications:

The start CostFed-web and create your own local demo, the Dockerfile can be downloaded from here

To help user, we provided some federated queries here from FedBench and LargeRDFBench which can be directly executed.

How to Run CostFed?

Used Benchmarks

The queries used in the evaluation can be downloaded from FedBench and LargeRDFBech homepage.

Datasets Availability

All the datasets and corresponding virtuoso SPARQL endpoints can be downloaded from the links given below. You may start a SPARQL endpoint from bin/start.bat (for windows) and bin/start_virtuoso.sh (for linux).

DatasetData-dumpWindows EndpointLinux EndpointLocal Endpoint UrlLive Endpoint Url
ChEBIDownloadDownloadDownloadyour.system.ip.address:8890/sparql-
DBPedia-SubsetDownloadDownloadDownloadyour.system.ip.address:8891/sparqlhttp://dbpedia.org/sparql
DrugBankDownloadDownloadDownloadyour.system.ip.address:8892/sparqlhttp://wifo5-04.informatik.uni-mannheim.de/drugbank/sparql
Geo NamesDownloadDownloadDownloadyour.system.ip.address:8893/sparqlhttp://factforge.net/sparql
JamendoDownloadDownloadDownloadyour.system.ip.address:8894/sparqlhttp://dbtune.org/jamendo/sparql/
KEGGDownloadDownloadDownloadyour.system.ip.address:8895/sparqlhttp://cu.kegg.bio2rdf.org/sparql
Linked MDBDownloadDownloadDownloadyour.system.ip.address:8896/sparqlhttp://www.linkedmdb.org/sparql
New York Times DownloadDownloadDownloadyour.system.ip.address:8897/sparql-
Semantic Web Dog FoodDownloadDownloadDownloadyour.system.ip.address:8898/sparqlhttp://data.semanticweb.org/sparql
AffymetrixDownload DownloadDownloadyour.system.ip.address:8899/sparqlhttp://cu.affymetrix.bio2rdf.org/sparql

Evaluation Results and Runtime Errors

We have compared 5 - FedX, SPLENDID, ANAPSID, SemaGrow, HiBISUCuS - state-of-the-art SPARQL endpoint federation systems with CostFed. Our complete evaluation results can be downloaded from here.

SPARQL ASK queries Error with Virtuoso

Recent Virtuoso public SPARQL endpoints (version 7 and above ) do not support SPARQL ASK queries to be sent gia RDF4J api, if you encounter such error then please make the following changes in the code

package package com.fluidops.fedx.evaluation; We need to go to go public class SparqlTripleSource class and change the boolean to false private boolean useASKQueries = false;

Authors

We are especially thankful to Andreas Schwarte (fluid Operations, Germany), Olaf Görlitz (University Koblenz, Germany), and Angelos Charalambidis (Institute of Informatics and Telecommunication, Paraskevi, Greece) for all their email conversations, feedbacks, and explanations.