Home

Awesome

HeFQUIN-VocabMappingsExperiments

This repository contains artifacts and resources for conducting experiments related to the paper ‘Considering Vocabulary Mappings in Query Plans for Federations of RDF Data Sources’.

The complete resources can be downloaded from Zenodo.

Federations

Three federations are provided for evaluation:

Datasets

To construct the federations, the dataset generator of the Lehigh University Benchmark (LUBM) is employed. Data about universities is generated and split into separate datasets, one per university.

Alternatively, generated datasets can be directly downloaded from Zenodo in the datasets directory.

Queries

Seven queries, expressed in terms of the global vocabulary, are designed. These queries vary regarding the types of vocabulary mapping rules relevant to them and can be used for all three federations.

Benchmark Queries

All queries are available in the queries directory.

Experiment Set-Up (step-by-step)

Step 1: Generate Datasets

Prepare the dataset generator using the following commands:

cd datasets
git clone https://github.com/rvesse/lubm-uba.git
cd lubm-uba
mvn clean install

Then, generate datasets for ten universities:

./generate.sh -o ../original_datasets --format NTRIPLES --consolidate Partial -u 10

The generated datasets are stored in the original_datasets folder within the dataset directory.

Step 2: Set Up a Federation

For an experiment, you can choose use any of the following federations. The following components are required to set up a federation:

To set up Fed1

  1. Navigate to the corresponding directory and copy the generated datasets:

    cd federations/fed1
    sh copy_datasets.sh
    
  2. In the file docker-compose.yaml, specify the path to the dataset of each federation member and a port number to access the dataset. Then, publish RDF datasets via Virtuoso using the following command:

    sh deploy.sh
    

Verify whether the dataset is accessibile. For example, if the port of a federation member is set to 8890, use the following command to check whether the server is accessible:

curl -X POST http://127.0.0.1:8890/sparql -H "Accept: text/csv" --data-urlencode 'query=SELECT count(*) WHERE {?s ?p ?o }'
  1. All federation members share the same group of vocabulary mapping rules, which are stored in the file mappings_complete.ttl under the datasets directory. The path to this file needs to be specified in the federation description file. This federation description is required for running experiments with HeFQUIN engine.

To set up Fed0

  1. Create datasets with the global vocabulary using the following command. Then the datasets with global vocabularies are stored in the globalvocab_datasets folder within the dataset directory:

    cd datasets
    sh construct_globaldatasets.sh
    
  2. Navigate to the corresponding directory and copy the generated datasets:

    cd federations/fed0
    sh copy_datasets.sh
    
  3. Similarly to Fed1, update configuration in the docker-compose.yaml file and publish RDF datasets via Virtuoso using the following command:

    sh deploy.sh
    

Note that no vocabulary mapping rules are needed in this setup.

To set up Fed2

  1. Rewrite original datasets to datasets with local vocabularies (distinct among different federation members):

    cd datasets
    sh rewrite_to_localschema.sh
    
  2. Navigate to the directory for Fed2 and copy the generated datasets:

    cd federations/fed2
    sh copy_datasets.sh
    
  3. Similarly to Fed1, update configuration in the docker-compose.yaml file and publish RDF datasets via Virtuoso using the following command:

    sh deploy.sh
    
  4. To get vocabulary mapping rules for each federaion member, run the following command. A new directory mappings with files storing mapping rules will be generated:

    sh constructMapping_comp.sh
    

Note that the path to each vocabulary mapping file needs to be specified in the federation description file (i.e., 'federation_desc_comp.ttl' under fed2).

Step 3: Execute Experiments

After setting up a federation, several parameters are required to execute experiments:

In the experiments directory, I provide a compilable JAR file (built on 2024-06-14) of the HeFQUIN engine and a script for running experiments are provided for each experiment. An example command for running experiments is shown below:

./run_experiment.sh [PATH_TO_QUERIES] [REPEAT_TIMES] HeFQUIN-0.0.3-SNAPSHOT.jar [PATH_TO_THE_FEDERATION_DESCRIPTION_FILE] [OUTPUT_DIRECTORY]

To reproduce compilable JAR files of the HeFQUIN engine for each experiment:

Note

The HeFQUIN-0.0.2 version is used for the evaluation in the paper ‘Considering Vocabulary Mappings in Query Plans for Federations of RDF Data Sources’. The generated datasets, compilable JAR files, and original log files can be downloaded from Zenodo.