Home

Awesome

Distributed Deep Learning Inference Pipeline

LinkedIn Twitter HitCount License

Cross-language and distributed deep learning inference pipeline for WebRTC video streams over Redis Streams. Currently supports YOLOX model, which can run well on CPU.

<br>

YOLOX Original Image <sup>Original image from YOLOX model to show what this application does in the end.<sup>

Pipeline Diagram <sup>Topology diagram of this project.<sup> <br>

WHY THIS PROJECT?

This project aims to demonstrate an approach to designing cross-language and distributed pipeline in deep learning/machine learning domain. Tons of demos and examples can be found on the internet which are developed end-to-end only (mostly) in Python. But this project is one of cross-language examples.

In this project, a Kubernetes-like orchestrator was not used, in place of this, independent Docker engines on different bare-metal host machines were configured, on purpose. The aim is to show how to configure things on multi-bare-metal host machines or multi-datacenter environments, only using Docker.

<br>

INGREDIENTS

This project consists of WebRTC signaling and orchestrator service(Go), WebRTC media server service (Go), YOLOX model deep learning inference service (Python), and Web front-end (TypeScript). Also includes a monitoring stack.

Uses for functionality:

Uses for monitoring:

<br>

DOCUMENTATION

More details of the project and monitoring configuration can be found in docs folder.

<br>

WEB APPLICATION

To access the web application UI, you can visit http://localhost:9000 (Tested on Chrome), after configuring the containers.

Web App

When you click on "Create PeerConnection" button, if everything is configured correctly:

<br>

Client side logs:

Client Side Logs

<br>

MONITORING

To see monitoring metrics via Grafana, you can visit http://localhost:9000/grafana.

Monitoring topology:

Monitoring Topology

For more details, read the Monitoring documentation.

Accessing Grafana:

To see monitoring metrics via Grafana, you can visit http://localhost:9000/grafana after configuring the containers.

Grafana dashboard

nvidia-smi output

<br>

INSTALLATION and RUNNING

This project was designed to run in Docker Container. For some configurations, you can check out docker-compose.yaml and .env files in the root folder.

Docker Compose file creates some containers, with some replica instances:

You can run it in production mode or development mode.

Production Mode

Note: Docker Host Operating System

Before doing chosen one of options below, this step should be done:

...
HOSTNAME_TAG=tag_name_for_host_machine
...
...
DOCKER_SOCKET_PREFIX=""
DOCKER_SOCKET_SUFFIX=""
...
...
DOCKER_SOCKET_PREFIX="/"
DOCKER_SOCKET_SUFFIX=""
...
...
DOCKER_SOCKET_PREFIX=""
DOCKER_SOCKET_SUFFIX=".raw"
...

There are different Docker Compose Profiles for different configurations you can choose:

1. Single Host, only CPU

This profile is to run whole services in same host machine, which has no graphic card supporting CUDA. Redis instance will be staying internal, and won't be exposed to network.

$ docker-compose --profile single_host_cpu up -d

2. Single Host, with GPU support

This profile is to run whole services in same host machine, which has at least one graphic card supporting CUDA. Redis instance will be staying internal, and won't be exposed to network.

$ docker-compose --profile single_host_gpu up -d

3. Central services with inference service, only CPU

This profile is to run whole services in same host machine, which has no graphic card supporting CUDA. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.

Similar to single_host_cpu, it can provide all services individually, but supports extra hosts.

$ docker-compose --profile central_with_inference_cpu up -d

4. Central services with inference service, with GPU support

This profile is to run whole services in same host machine, which has at least one graphic card supporting CUDA. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.

Similar to single_host_cpu, it can provide all services individually, but supports extra hosts.

$ docker-compose --profile central_with_inference_gpu up -d

5. Central services without inference service, multiple hosts mode

5.1. Steps should be done in central host:

This profile is to run only central services in host machine, without inference services. It doesn't function without any extra inference services with properly registered into Signaling service. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.

$ docker-compose --profile central up -d
5.2. Steps should be done in other multiple inference hosts:
...
REDIS_HOST=ip_of_central_host # should be "redis" if single host configuration, e.g. 192.168.0.15 in distributed configuration
REDIS_PORT=port_of_central_host_redis_port # default is 6379
...
INFLUXDB_HOST=ip_of_central_host # should be "influxdb" if single host configuration, e.g. 192.168.0.15 in distributed configuration
INFLUXDB_PORT=port_of_central_host_influxdb_port  # default is 8086
...

For only CPU mode:

$ docker-compose --profile inference_cpu up -d

For with GPU support mode:

$ docker-compose --profile inference_gpu up -d

Common post-instructions for all alternative configurations:

$ docker-compose logs -f

<a name="dev-mode"></a>Development Mode: VS Code Remote - Containers

To continue with VS Code and if this is your first time to work with Remote Containers in VS Code, you can check out this link to learn how Remote Containers work in VS Code and follow the installation steps of Remote Development extension pack.

Then, follow these steps:

<br>

LICENSE

Distributed Deep Learning Inference Pipeline project is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.