Awesome

Morphling

Morphling is an auto-configuration framework for machine learning model serving (inference) on Kubernetes. Check the website for details.

Morphling paper accepted at ACM Socc 2021:
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving

Overview

Morphling tunes the optimal configurations for your ML/DL model serving deployments. It searches the best container-level configurations (e.g., resource allocations and runtime parameters) by empirical trials, where a few configurations are sampled for performance evaluation.

Stack

Features

Key benefits include:

Automated tuning workflows hidden behind simple APIs.
Out of the box ML model serving stress-test clients.
Cloud agnostic and tested on AWS, Alicloud, etc.
ML framework agnostic and generally support popular frameworks, including TensorFlow, PyTorch, etc.
Equipped with various and customizable hyper-parameter tuning algorithms.

Getting started

Install using Yaml files

Install CRDs

From git root directory, run

kubectl apply -k config/crd/bases

Install Morphling Components

kubectl create namespace morphling-system

kubectl apply -k manifests/configmap
kubectl apply -k manifests/controllers
kubectl apply -k manifests/pv
kubectl apply -k manifests/mysql-db
kubectl apply -k manifests/db-manager
kubectl apply -k manifests/ui
kubectl apply -k manifests/algorithm

By default, Morphling will be installed under morphling-system namespace.

The official Morphling component images are hosted under docker hub.

Check if all components are running successfully:

kubectl get deployment -n morphling-system

Expected output:

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
morphling-algorithm-server   1/1     1            1           34s
morphling-controller         1/1     1            1           9m23s
morphling-db-manager         1/1     1            1           9m11s
morphling-mysql              1/1     1            1           9m15s
morphling-ui                 1/1     1            1           4m53s

Uninstall Morphling controller

bash script/undeploy.sh

Delete CRDs

kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd

Install using Helm chart

Install Helm

Helm is a package manager for Kubernetes. A demo installation on MacOS:

brew install helm

Check the helm website for more details.

Install Morphling

From the root directory, run

helm install morphling ./helm/morphling --create-namespace -n morphling-system

You can override default values defined in values.yaml with --set flag. For example, set the custom cpu/memory resource:

helm install morphling ./helm/morphling --create-namespace -n morphling-system  --set resources.requests.cpu=1024m --set resources.requests.memory=2Gi

Helm will install CRDs and other Morphling components under morphling-system namespace.

Uninstall Morphling

helm uninstall morphling -n morphling-system

Delete all Morphling CRDs

kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd

Morphling UI

Morphling UI is built upon Ant Design.

If you are installing Morphling with Yaml files, from the root directory, run

kubectl apply -k manifests/ui

Or if you are installing Morphling with Helm chart, Morphling UI is automatically deployed.

Stack

Check if all Morphling UI is running successfully:

kubectl -n morphling-system get svc morphling-ui

Expected output:

NAME           TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
morphling-ui   NodePort   10.96.63.162   <none>        9091:30680/TCP   44m

If you are using minikube, you can get access to the UI with port-forward:

kubectl -n morphling-system port-forward --address 0.0.0.0 svc/morphling-ui 30263:9091

Then you can get access to the ui at http://localhost:30263/.

For detailed UI deployment and developing guide, please check UI.md

Running Examples

This example demonstrates how to tune the configuration for a mobilenet model deployed with Tensorflow Serving under Morphling.

For demonstration, we choose two configurations to tune: the first one the CPU cores (resource allocation), and the second one is maximum serving batch size (runtime parameter). We use grid search for configuration sampling.

Submit the configuration tuning experiment

kubectl -n morphling-system apply -f https://raw.githubusercontent.com/alibaba/morphling/main/examples/experiment/experiment-mobilenet-grid.yaml

To start multi-framework tunining experiment:

kubectl -n morphling-system apply -f examples/experiment/experiment-grid.yaml

You can specify the model name in this file examples/experiment/experiment-grid.yaml. Noted that under the setting of INFERENCE_FRAMEWORK=vllm and DTYPE=int8, the bitsandbytes only support LLMs with LLAMA architecture (LlamaForCausalLM). So far we only support tuning between float16/bfloat16 and int8 data types. Make sure there are enough resources for LLM serving.

Monitor the status of the configuration tuning experiment

kubectl get -n morphling-system pe
kubectl describe -n morphling-system pe

Monitor sampling trials (performance test)

kubectl -n morphling-system get trial

Get the searched optimal configuration

kubectl -n morphling-system get pe

Expected output:

NAME                        STATE       AGE   OBJECT NAME   OPTIMAL OBJECT VALUE   OPTIMAL PARAMETERS
mobilenet-experiment-grid   Succeeded   12m   qps           32                     [map[category:resource name:cpu value:4] map[category:env name:BATCH_SIZE value:32]]

Delete the tuning experiment

kubectl -n morphling-system delete pe --all

Workflow

See Morphling Workflow to check how Morphling tunes ML serving configurations automatically in a Kubernetes-native way.

Developer Guide

Build the controller manager binary

make manager

Run the tests

make test

Generate manifests, e.g., CRD, RBAC YAML files, etc.

make manifests

Build Multi inference framework Docker Image

Download the right version of vllm .whl file to pkg/server directory (the guidance to download) before building the image. For example, if the CUDA version is 11.8 and want to download vllm with version 0.6.1.post1, then download vllm-0.6.1.post1+cu118-cp310-cp310-manylinux1_x86_64.whl to pkg/server directory. Noeted that the python version in this image is 3.10. Then modify the arguments CUDA_VERSION and VLLM_FILE in script/docker_build.sh, and building the image.

Build the component docker images, e.g., Morphling controller, DB-Manager

make docker-build

Push the component docker images

make docker-push

To develop/debug Morphling controller manager locally, please check the debug guide.

Community

If you have any questions or want to contribute, GitHub issues or pull requests are warmly welcome.