Home

Awesome

SuperScaler

SuperScaler is an open-source distributed platform for deep learning training. SuperScaler aims to provide the flexbile support of training parallelization and be extensible to new parallelisms and optimizations. By leveraging existing deep learning frameworks like TensorFlow and NNFusion for local execution while supporting efficient distributed training with highly-optimized communication stacks, SuperScaler is exploring the new oppotunities of parallel deep learning training.

Status

(alpha preview)

Install

Install on a Bare-metal Machine

Run with Docker

Using SuperScaler at Docker environment is the easiest method.

Run your first model with SuperScaler

Here we use a TensorFlow model as an example.

# Build a TensorFlow model with returning applying_gradient_op and loss_op (ref example/tensorflow/dummy_model.py)
apply_gradient, loss = apply_gradient_op, loss_op
import superscaler.tensorflow as superscaler
from superscaler.scaler_graph import DataParallelism
import argparse
sc = superscaler()
strategy = DataParallelism(range(2))
deployment_setting = {"1": "localhost"}
communication_DSL = "ring"
resource_pool = "./resource_pool.yaml"
sc.init(apply_gradient_op, loss, deployment_setting, strategy,
        communication_DSL, resource_pool
parser = argparse.ArgumentParser()
args, _ = parser.parse_known_args()
args.steps = 10
args.interval = 5
args.print_info = True
args.print_fetches_targets = True
sc.run(args)

Appendix: A Sample resource_pool.yaml

Server:
    hostname1:
        CPU:
            0:
                properties:
                    average_performance: "12Gibps"
                links:
                    -
                        dest: "/server/hostname1/CPU/1/"
                        type: "RDMA"
                        rate: "100Gibps"
                        propagation_latency: "20us"
                        scheduler: 'FairSharing'
            1:
                properties:
                    average_performance: "12Gibps"
                links:
                    -
                        dest: "/server/hostname1/CPU/0/"
                        type: "RDMA"
                        rate: "100Gibps"
                        propagation_latency: "20us"
                        scheduler: 'FairSharing'
        GPU:
            0:
              properties:
                  average_performance: "12Tibps"
              links:
                  -
                      dest: "/switch/switch0/"
                      type: "PCIE"
                      rate: "80bit/s"
                      propagation_latency: "2us"
                      scheduler: 'FIFO'
                  -
                      dest: "/server/hostname1/GPU/1/"
                      type: "RDMA"
                      rate: "100bit/s"
                      propagation_latency: "2us"
                      scheduler: 'FIFO'
            1:
              properties:
                  average_performance: "12Tibps"
              links:
                  -
                      dest: "/switch/switch0/"
                      type: "PCIE"
                      rate: "80bit/s"
                      propagation_latency: "2us"
                      scheduler: 'FIFO'
                  -
                      dest: "/server/hostname1/GPU/0/"
                      type: "RDMA"
                      rate: "100bit/s"
                      propagation_latency: "2us"
                      scheduler: 'FIFO'
 
Switch:
    switch0:
        links:
            -
                dest: "/server/hostname1/GPU/0/"
                type: "PCIE"
                rate: "80bit/s"
                propagation_latency: "2us"
                scheduler: 'FIFO'
            -
                dest: "/server/hostname1/GPU/1/"
                type: "PCIE"
                rate: "80bit/s"
                propagation_latency: "2us"
                scheduler: 'FIFO'