Home

Awesome

AITemplate

License | Documentation | CircleCI Deploy docs to Pages

AITemplate (AIT) is a Python framework that transforms deep neural networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for lightning-fast inference serving. AITemplate highlights include:

More about AITemplate

Excellent Backward Capability

AITemplate doesn't depend on third-party libraries or runtimes, such as cuBLAS, cuDNN, rocBLAS, MIOpen, TensorRT, MIGraphX, etc. Each model is compiled into a self-contained portable binary, which can be used on any software environment with the same hardware.

Horizontal Fusion

AITemplate provides unique advanced horizontal fusion. AITemplate can fuse parallel GEMM, LayerNorm, and other operators with different input shapes into a single GPU kernel.

Vertical Fusion

AITemplate provides strong vertical fusion. AITemplate can fuse a large range of operations into TensorCore/MatrixCore operations, such as elementwise operations, reductions, and layout permutations. AITemplate also provides back-to-back style TensorCore / MatrixCore operation fusion.

Memory Fusion

AITemplate provides innovative memory fusions. AITemplate can fuse GEMM, LayerNorm, and other operators, followed by memory operations such as concatenation, split, and slice into a single operator.

Working w/wo PyTorch

The AITemplate-generated Python runtime can take PyTorch tensors as inputs and outputs without an extra copy. For environments without PyTorch, the AITemplate Python/C++ runtime is self-contained.

Extensions without suffering

AITemplate provides a straightforward approach for making an extension in codegen. To add a new operator or a new fused kernel into AITemplate, most of the time one only needs to add two Python files: one for a graph node definition and another for the backend codegen. The CUDA/HIP kernel in a text header file can be directly utilized in the codegen.

FX2AIT

FX2AIT is a Python-based tool that converts PyTorch models into AITemplate (AIT) engine for lightning-fast inference serving. Using FX2AIT's built-in AITLowerer, partial AIT acceleration can be achieved for models with unsupported operators in AITemplate.

Key features of FX2AIT include:

More info can be found from https://github.com/facebookincubator/AITemplate/tree/main/fx2ait.

Installation

Hardware requirements:

Clone the code

When cloning the code, please use the following command to also clone the submodules:

git clone --recursive https://github.com/facebookincubator/AITemplate

Docker Image

We highly recommend using AITemplate with Docker to avoid accidentally using a wrong version of NVCC or HIPCC.

This will build a docker image with tag ait:latest.

From Source

The following command will create a Python wheel for AITemplate. Please ensure you have correct CUDA/ROCm compiler installed.

Incorrect compiler will lead performance regression.

Please check all submodules are cloned correctly before go to next step.

cd python
python setup.py bdist_wheel
pip install dist/*.whl --force-reinstall

Getting Started

Check out the AITemplate Documentation for API reference.

There are a few tutorials for onboarding:

Examples & Performance

AITemplate provides the following model templates & reference performance data on A100/MI-250:

Release

All current development updates can be seen in the AITemplate repository. Releases are not on a set schedule and will only be tagged for significant feature releases.

Mid-term plan:

Long-term plan:

Contributing

Check our contributing guide to learn about how to contribute to the project.

The Team

AITemplate is currently maintained by Meta engineers: Ying Zhang, Yang Chen, Terry Chen, Mu-Chu Lee, Max Podkorytov, Adnan Akhundov.

AITemplate is co-created by Meta engineers: Bing Xu, Ying Zhang, Hao Lu, Yang Chen, and Terry Chen, with major contributions coming from other talented engineers. A non-exhaustive list to mention is Mike Iovine, Mu-Chu Lee, Scott Wolchok, Oleg Khabinov, Shirong Wu, Huamin Li, Hui Guo, Zhijing Li, Max Podkorytov. We also want to thank Andrew Tulloch, Yinghai Lu, Lu Fang for the valuable discussions.

FX2AIT and Aten2AIT are co-created and maintained by Meta engineers: Wei Wei, Shirong Wu and Zhijing Li.

Acknowledgements

AITemplate team works closely with NVIDIA CUTLASS Team (led by Andrew Kerr, Haicheng Wu) and AMD Composable Kernel Team (led by Chao Liu, Jing Zhang). We co-designed many advanced GPU optimizations specialized for each platform, and nothing is possible without our close collaboration.

License

AITemplate is licensed under the Apache 2.0 License.