- Overview
- API QuickView
- Setup and Examples
- Publications
- Tutorials and Documents for Developers
- How to Contribute
- FAQ
- Contact Us
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads, which is one of the key components of Alibaba's PAI-Blade. BladeDISC provides general, transparent, and ease of use performance optimization for TensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture natively supports dynamic shape workloads, with many considerations in the performance of both static and dynamic shape scenarios. It also supports multiple and flexible deployment solutions, including both Plugin Mode inside TensorFlow/PyTorch runtime, and Standalone Mode for AOT standalone execution. The project is based on MLIR and highly related with mlir-hlo project.
Refer to our website for more information, including the setup tutorial, developer guide, demo examples and documents for developers.
TensorFlow [1] | PyTorch [2] | |
---|---|---|
Inference | Yes | Yes |
Training | Yes [3] | Ongoing |
[1] TensorFlow 1.12, 1.15, 2.4 & 2.5 are supported and fully verified. For other versions some slight works on adaptation might be needed.
[2] 1.6.0 <= PyTorch version < 1.9.0 has been fully verified.
[3] Although supported, there's much room for improvement on Op coverage for training workloads.
Status | |
---|---|
Nvidia GPU | Yes |
AMD GPU | Ongoing |
Hygon DCU | Yes |
X86 | Yes |
-
Plugin Mode - BladeDISC works as a plugin of TensorFlow or PyTorch. Only the supported Ops are clustered and compiled, and the unsupported ones will be executed by the original TensorFlow or PyTorch runtime. We recommend this mode to most of the users for its transparency and ease of use.
-
Standalone Mode - In Standalone mode, the input workload will be compiled into a binary that can be executed by it self, aka, does not rely on a TensorFlow or PyTorch runtime. In this mode all ops must be supported.
By evaluating BladeDISC using a set of typical machine learning workloads for production purpose, DISC shows up to 3x speedup compared with TensorFlow/PyTorch.
Specifically, for the BERT large inference on T4 we provide in the examples, static compiler optimization (XLA) shows severe performance degradation due to its compilation overhead, while DISC shows a 1.75x speedup.
TensorFlow | XLA | DISC |
---|---|---|
1.78 s | 41.69s | 1.02s |
1X | 1.75X |
Only two lines of code are needed on native Tensorflow program as the following:
import numpy as np
import tensorflow as tf
## enable BladeDISC on TensorFlow program
import tensorflow_blade_disc as disc
disc.enable()
## construct TensorFlow Graph and run it
g = tf.Graph()
with g.as_default():
...
with tf.session as sess:
sess.run(...)
For more information, please refer to QuickStart for TensorFlow Users
PyTorch users only need the following few lines of code to enable BladeDISC:
import torch_blade
# construct PyTorch Module
class MyModule(nn.Module):
...
module = MyModule()
with torch.no_grad():
# blade_module is the optimized module by BladeDISC
blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y))
# run the optimized module
blade_module(x, y)
torch_blade.optimize
accepts an nn.Module
object and outputs the
optimized module. For more information, please refer to Quickstart
for PyTorch Users.
- How to Setup and Build from Source
- Use Case of TensorFlow Inference and Training
- Use Case of PyTorch Inference
- Tutorial: A Walkthough of the BladeDISC Pass Pipeline
- Introduction on Runtime Abstraction Layer
- TorchBlade Overview
- Tutorial: How to Add a New Torch Operator Converter
BladeDISC is in a close relationship with mlir-hlo project. Part of the building blocks, including the MHLO Op definitions, TF to MHLO conversions, and some general purpose passes have been upstreamed to mlir-hlo repository. We'll continue to work in a close cooperative relationship with mlir-hlo project in the longer term.
-
Mailgroup: [email protected]
-
DingTalk group for support and discussion: