Home

Awesome

BladeDISC Introduction <!-- omit in toc -->

We're hiring!🔥🔥🔥

We're always looking for candicates to join dev team. Your're the one we're searching for long:

Please contact us via email or Dingtalk at the bottom of page.⬇️⬇️⬇️

What's New

Overview

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads, which is one of the key components of Alibaba's PAI-Blade. BladeDISC provides general, transparent, and ease-of-use performance optimization for TensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture natively supports dynamic shape workloads, with many considerations in the performance of both static and dynamic shape scenarios. It also supports multiple and flexible deployment solutions, including both Plugin Mode inside TensorFlow/PyTorch runtime, and Standalone Mode for AOT standalone execution. The project is based on MLIR and highly related to mlir-hlo project.

Refer to our website for more information, including the setup tutorial, developer guide, demo examples and documents for developers.

Features and Roadmap

Frontend Framework Support Matrix

TensorFlow [1]PyTorch [2]
InferenceYesYes
TrainingYes [3]Ongoing

[1] TensorFlow 1.12, 1.15, 2.4 & 2.5 are supported and fully verified. For other versions, some slight work on adaptation might be needed.

[2] PyTorch version >= 1.6.0 has been fully verified.

[3] Although supported, there's much room for improvement on Op coverage for training workloads.

Backend Support Matrix

Status
Nvidia GPUYes [1]
AMD GPUYes
Hygon DCUYes
X86Yes
AArch64Yes

[1] Support for CUDA below 11.0 has been deprecated officially since Aug 2022.

Deployment Solutions

Numbers of Typical Workloads

By evaluating BladeDISC using a set of typical machine learning workloads for production purposes, BladeDISC shows up to 6.95x speedup compared with PyTorch. Moreover, compared to static optimizing compilers (i.e., XLA and TensorRT), BladeDISC shows comparable or even better performance.

<figure align="center"> <img src="./docs/pics/numbers.png" style="width:80%"> <figcaption align = "center"> <b> Fig.1 End-to-end Performance of BladeDISC and baselines. Note that some baselines fail to optimize ViT model. </b> </figcaption> </figure>

Advantage in Dynamic Shape Workloads

Specifically, for the BERT large inference on T4 GPU, we provide in the examples, static compiler optimization (XLA) shows severe performance degradation due to its compilation overhead, while BladeDISC shows a 1.75x speedup.

TensorFlowXLABladeDISC
1.78 s41.69s1.02s
1X1.75X

API QuickView

For TensorFlow Users

Only two lines of code are needed on native TensorFlow program as the following:

import numpy as np
import tensorflow as tf

## enable BladeDISC on TensorFlow program
import blade_disc_tf as disc
disc.enable()

## construct TensorFlow Graph and run it
g = tf.Graph()
with g.as_default():
    ...
    with tf.session as sess:
        sess.run(...)

For more information, please refer to QuickStart for TensorFlow Users

For PyTorch Users

PyTorch users only need the following few lines of code to enable BladeDISC:

import torch_blade
# construct PyTorch Module
class MyModule(nn.Module):
    ...

module = MyModule().eval()

with torch.no_grad():
    # blade_module is the optimized module by BladeDISC
    blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y))

# run the optimized module
blade_module(x, y)

torch_blade.optimize accepts an nn.Module object and outputs the optimized module. For more information, please refer to Quickstart for PyTorch Users.

Setup and Examples

Publications

Tutorials and Documents for Developers

Presentations and Talks

How to Contribute

Building Status

FrameworkDeviceStatus
PyTorch PreGPUpytorch_pre_gpu
PyTorch PreCPUpytorch_pre_cpu
PyTorch2.0.0GPUpytorch200_gpu
PyTorch2.0.0CPUpytorch200_cpu
PyTorch2.0.0Yitianpytorch200_yitian
PyTorch1.13.0GPUpytorch113_gpu
PyTorch1.13.0CPUpytorch113_cpu
PyTorch1.13.0Yitianpytorch113_yitian
TensorFlow2.5GPUtf250_gpu
TensorFlow2.5CPUtf250_cpu
TensorFlow2.8Yitiantf280_yitian

FAQ

Roadmap with mlir-hlo Project

BladeDISC is in a close relationship with mlir-hlo project. Part of the building blocks, including the MHLO Op definitions, TF to MHLO conversions, and some general purpose passes have been upstreamed to mlir-hlo repository. We'll continue to work in a close cooperative relationship with mlir-hlo project in the longer term.

Roadmap with Torch-MLIR Project

BladeDISC compiles PyTorch workloads based on Torch-MLIR. The BladeDISC Dev Team is cooperating with the community to add Torch-To-Mhlo conversion to Torch-MLIR, especially fully dynamic shape features. See RFC: https://github.com/llvm/torch-mlir/issues/999. We appeal to the community developers interested in joining.

Contact Us

DingTalk