Home

Awesome

Scaling Moving Object Database: MobilityDB and Google Kubernetes Engine

This repository defines an API that enables the scaling of a moving object database within Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). The PostgreSQL server serves as the relational database management system (RDBMS), supplemented by the MobilityDB extension that efficiently handles the manipulation of moving object data. To learn more about the MobilityDB extension for PostgreSQL, you can refer to the implementation and documentation available at this link. Furthermore, the Citus data extension for PostgreSQL is incorporated to facilitate table partitioning and the efficient distribution of SQL queries across a group of PostgreSQL nodes. To delve into details about the Citus extension, you can explore the complete implementation provided here.

This work has been developed as part of a my master's thesis in computer science. A detailed report can be found here, with my defense's presentation and a short video that demonstrates resize cluster operation.

Get Started

In this section, we will outline the various steps required to scale a moving object database in GCP. We'll provide a comprehensive guide that details the necessary command-line instructions for each phase. Our process begins with the creation of a GKE cluster, wherein we allocate specific GCP resources. Subsequently, we deploy a PostgreSQL server, complete with the MobilityDB and Citus extensions. Following this, we move forward to initializing the Citus cluster, a step that involves establishing connections between Citus nodes to enable the partitioning of large tables and the distribution of partitions and queries across multiple nodes within the cluster. The following figure presents an architectural overview of the Citus cluster within the GKE product.

<p align="center"> <img src="./docs/src/stack-citus-management.drawio.png" width="525"> </p>

It is recommended to scale out the cluster when the database becomes large in order to optimize performance. Scaling out involves adding more nodes to the existing cluster. Since PostgreSQL tables are distributed across GKE worker nodes through the Citus extension, adding new nodes triggers the redistribution of tables to evenly distribute partitions across the cluster that incorporates the new nodes. In other words, adding more nodes translates to an increase in computational capacity, including CPU cores and memory. In the event of scale-out or scale-in operations, it is essential to execute the following command to ensure data consistency. This command triggers a Python script designed to automate the redistribution of the cluster when new nodes are added and to manage data evacuation in the event of node deletion. The figure below illustrates the outcome of the scale-out operation initiated by the Python command that scales the cluster to three nodes. alt text In the upcoming section, we will provide a tutorial demonstrating the process of deploying a distributed GKE cluster to effectively manage and scale a moving object database analyzed using the MobilityDB extension for PostgreSQL. We will present the required commands to achieve this.

Tutorial

In this tutorial, we will create a workflow that outlines the necessary steps to deploy a GKE cluster, aiming to establish a distributed environment for a PostgreSQL database.

GKE Cluster Initialization

First and foremost, you need to possess a Google account to link it within the GCP console. If you already have an existing Google account, you can associate it with GCP by signing in here. Once you have a GCP account, it's necessary to acquire credits to utilize GCP services through the billing account. For further information about GCP credits and billing accounts, please refer to this link. Next, the subsequent steps detail the process of setting up a GKE cluster.

gcloud projects create PROJECT_ID
gcloud container clusters create mobilitydb-cluster --zone europe-west1-c
--node-pool mobilitydb-node-pool --machine-type e2-standard-4
--disk-type pd-balanced --num-nodes 4
# Get the credentials of the cluster
gcloud container clusters get-credentials mobilitydb-cluster
--zone europe-west1-c --project PROJECT_ID
# View your cluster information
gcloud container clusters describe mobilitydb-cluster-1
--zone europe-west1-c 
# You should see your cluster created after few 
# minutes in your GCP console
# View your cluster nodes information via the kubectl command
kubectl get node -o wide

Citus Cluster Initialization

Once the GKE cluster is established, you can deploy our cloud-native solution to initialize the Citus cluster. To accomplish this, you must first perform the following operations.

git clone https://github.com/MobilityDB/MobilityDB-GCP
export POSTGRES_USER=your-user
export POSTGRES_PASSWORD=your-password
export POSTGRES_DB=your-db-name
export POSTGRES_PORT=30001
source ~/.bashrc
# Create the docker registry in GCP artifact registry
gcloud artifacts repositories create  repo-test --repository-format=docker
--location=europe-west1 --description==testing --immutable-tags --async
# Pull the cloud native docker image from the docker hub
docker pull bouzouidja/mobilitydb-cloud:latest
# Tag the pulled docker image mobilitydb-cloud into the docker
# registry repo-test
docker tag bouzouidja/mobilitydb-cloud:latest europe-west1-docker.pkg.dev/
PROJECT_ID/repo-test/mobilitydb-cloud:latest
# Push the taged image to the  repo-test docker registry
docker push bouzouidja/mobilitydb-cloud:latest europe-west1-docker.pkg.dev/
PROJECT_ID/repo-test/mobilitydb-cloud:latest
gcloud compute firewall-rules create mobilitydb-node-port
--project PROJECT_ID --allow tcp:30001
# Execute those commands simultaneously, and wait until the Pods will
# be in the running state
kubectl create -f postgres-secret.yaml
kubectl create -f coordinator-deployment.yaml
kubectl create -f workers-deployment.yaml
# View your Pods
kubectl get pods -o wide
# You should see the coordinator Pod and the workers Pods running
# Access to your PostgreSQL database from the coordinator using
# the psql client
psql -h 34.34.151.202 -U docker -p 30001 -d mobilitydb
mobilitydb=#

34.34.151.202 is the external IP address of the coordinator node in my created cluster, it can be found using the command.

kubectl get node -o wide
NAME STATUS ROLES AGE VERSION  INTERNAL-IP EXTERNAL-IP      OS-IMAGE
KERNEL-VERSION  CONTAINER-RUNTIME
gke-mobilitydb-clust-mobilitydb-node--77a55075-cssr   Ready    <none>   121m   v1.27.3-gke.100   10.132.15.217   34.34.151.202    Container-Optimized OS
from Google   5.15.109+        containerd://1.7.0
gke-mobilitydb-clust-mobilitydb-node--77a55075-l2wd   Ready    <none>   121m   v1.27.3-gke.100   10.132.15.220   35.187.85.28     Container-Optimized OS
from Google   5.15.109+        containerd://1.7.0
gke-mobilitydb-clust-mobilitydb-node--77a55075-ldgc   Ready    <none>   121m   v1.27.3-gke.100   10.132.15.219   34.77.253.167    Container-Optimized OS
from Google   5.15.109+        containerd://1.7.0
gke-mobilitydb-clust-mobilitydb-node--77a55075-wcl8   Ready    <none>   121m   v1.27.3-gke.100   10.132.15.218   35.205.205.195   Container-Optimized OS
from Google   5.15.109+        containerd://1.7.0
mobilitydb=# SELECT * from citus_get_active_worker_nodes();
 node_name  | node_port 
------------+-----------
 10.48.1.15 |      5432
 10.48.2.8  |      5432
 10.48.3.7  |      5432
(3 rows)

Horizontal Scaling Command

The command below initiates a resizing operation of the current cluster by providing the appropriate parameters.

# Make sure to export your environment variables, including POSTGRES_USER,
# POSTGRES_PASSWORD, POSTGRES_PORT and POSTGRES_DB. 
python citus_cluster_management.py resize --cluster-name mobilitydb-cluster-1
--cluster-zone europe-west1-b --cluster-project PROJECT_ID --num-nodes 8
# Waiting the completion of all operations.