Awesome

Overview

This repository contains files for conducting experiments related to the paper A Cost Model to Optimize Queries over Heterogeneous Federations of RDF Data Sources.

We have implemented the cost model introduced in the paper (as well as the baseline approach) in our query federation engine called HeFQUIN.

For running the experiments we utilize KOBE, a benchmarking system based on Kubernetes infrastructure, to containerize and configure federations of RDF datasets, queries, federation engines, and experiments. The typical workflow for defining a KOBE experiment includes the following steps:

DatasetTemplate: Create one for each dataset server used in your benchmark.
Benchmark: Define it with a list of datasets and queries.
FederatorTemplate: Create one for the federator engine used in your experiment.
Experiment: Define it over your previously defined benchmark.

Installation

Prerequisites

Before installing the KOBE benchmarking engine, ensure the following prerequisites are met:

kubectl: Install using native package management. Detailed instructions for Linux can be found here. The experiments in the paper use version v1.20.7. Verify the installation:
```
kubectl version --client
```
nfs-common: Install on the nodes of the cluster. For Debian or Ubuntu:
```
apt-get install nfs-common
```
minikube: Detailed installation instructions can be found here.

Starting the Cluster

Start your cluster using the following commands:

minikube start
kubectl get po -A
minikube dashboard

Open the Minikube dashboard locally by accessing the default port in your browser:

http://127.0.0.1:44393/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/

If the cluster is started on a remote server, such as ontology.ida.liu.se, use SSH tunneling to access the dashboard.

Install KOBE

A detailed instruction for installing KOBE can be found here. There are some notes for the installation.

When installing the Networking subsystem, you can consult the official installation guide or follow the steps below.

To download 'istio' with specific verion, use the follwoing curl command:

    curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.11.3 TARGET_ARCH=x86_64 sh -

Then, add the istioctl client to your path (Linux or macOS):

    export PATH=$PWD/istio-1.11.3/bin:$PATH

Then, install Istio using the following command:

istioctl install --set profile=demo -y

Before installing the logging subsystem, 'helm' need to be installed. Note the current commend only works for helm2. An instruction to install and set up helm2 and tiller can be found here

After set up all required components, you can check status of all Pods via browswer, or via command: kubectl get pods

Conducting Experiments

First, make sure the cluster is running

minikube status

Deploy dataset servers: Virtuoso, TPF, and brTPF server

kobectl apply dataset/dataset-virtuoso/virtuosotemplate.yaml
kobectl apply dataset/dataset-ldfserver-hdt/ldfservertemplate-hdt.yaml
kobectl apply dataset/dataset-brtpfserver/brtpfservertemplate.yaml

For the TPF server and brTPF server, several docker images are used and corresponding source code for these images can be found in the corresponding directory. If any changes applied, these docker images need to be rebuilt using the following command:

TPF server:

cd dataset/dataset-ldfserver-hdt
cd ldfserver-init-hdt
docker build --no-cache -t chengsijin817/ldfserver-init-hdt .
docker push chengsijin817/ldfserver-init-hdt
    
cd ../ldfserver-main-hdt
docker build --no-cache -t chengsijin817/ldfserver-main-hdt .
docker push chengsijin817/ldfserver-main-hdt

Note: 'chengsijin817/ldfserver-init-hdt' and 'chengsijin817/ldfserver-main-hdt' are image names, which can be renamed but should be the same as specified in ldfservertemplate-hdt.yaml file.

brTPF server:

cd dataset/dataset-brtpfserver
cd brtpfserver-init
docker build --no-cache -t chengsijin817/brtpfserver-init .
docker push chengsijin817/brtpfserver-init
    
cd ../brtpfserver-main
docker build --no-cache -t chengsijin817/brtpfserver-main .
docker push chengsijin817/brtpfserver-main

Similarly, 'chengsijin817/brtpfserver-init' and 'chengsijin817/brtpfserver-main' are image names, which can be renamed but should be the same as specified in brtpfservertemplate.yaml file.

Deploy a benchmark, specifying all federation members and benchmark queries.

kobectl apply benchmark-fedbench/fedbench-het3-nodelay.yaml
kobectl show benchmark fedbench-het3-nodelay

Note: Queries should use correct URIs in SERVICE clause, depending on the type of interface of each federation member. Two example federations are provided under directory 'benchmark-fedbench'.

Deploy HeFQUIN engine

Use one of the following commands to apply a implementation of HeFQUIN engine:

kobectl apply federator-hefquin/hefquin-mincost-greedy.yaml

kobectl apply federator-hefquin/hefquin-card-greedy.yaml

For example, hefquin-mincost-greedy.yaml invokes a docker image of HeFQUIN engine, which implements cost-based greedy algorithm applied.

To integrate the HeFQUIN engine with the KOBE system, use the following Docker images: 'chengsijin817/hefquin-init' and 'chengsijin817/hefquin-init-all'. The source code for these images can be found in the repository directory. To rebuild these images after updates:

cd federator-hefquin/hefquin-init
docker build --no-cache -t chengsijin817/hefquin-init .
docker push chengsijin817/hefquin-init

cd ../hefquin-init-all
docker build --no-cache -t chengsijin817/hefquin-init-all .
docker push chengsijin817/hefquin-init-all

Deploy experiments based on the provided configuration files, fDepending on the algorithm and federation used, apply the appropriate experiment configuration:

kubectl apply -f experiment/fed3nodelay-mincostgreedy-exp.yaml
# or
kubectl apply -f experiment/fed4nodelay-mincostgreedy-exp.yaml
# or
kubectl apply -f experiment/fed3nodelay-cardgreedy-exp.yaml
# or
kubectl apply -f experiment/fed4nodelay-cardgreedy-exp.yaml

After completion, download the log file via the Minikube dashboard.

Cleaning Up Before Next Experiment

To clean up all components:

kubectl delete experiments.kobe.semagrow.org --all
kubectl delete benchmarks.kobe.semagrow.org --all
kubectl delete federatortemplates.kobe.semagrow.org --all
kubectl delete datasettemplates.kobe.semagrow.org --all
kubectl delete pod kobenfs

Alternatively, remove specific components using the following command:

kubectl delete experiment het3nodelay-mincostgreedy-exp
kubectl delete federatortemplate hefquintemplate-mincost-greedy

Stopping the Cluster

To stop the cluster:

minikube status
minikube stop
minikube delete

Note: Previous experiment results will be lost upon restarting Minikube.