Awesome
Uni-Core, an efficient distributed PyTorch framework
Uni-Core is built for rapidly creating PyTorch models with high performance, especially for Transfromer-based models. It supports the following features:
- Distributed training over multi-GPUs and multi-nodes
- Mixed-precision training with fp16 and bf16
- High-performance fused CUDA kernels
- model checkpoint management
- Friendly logging
- Buffered (GPU-CPU overlapping) data loader
- Gradient accumulation
- Commonly used optimizers and LR schedulers
- Easy to create new models
Installation
Build from source
You can use python setup.py install
or pip install .
to build Uni-Core from source. The CUDA version in the build environment should be the same as the one in PyTorch.
You can also use python setup.py install --disable-cuda-ext
to disalbe the cuda extension operator when cuda is not available.
Use pre-compiled python wheels
We also pre-compiled wheels by GitHub Actions. You can download them from the Release. And you should check the pyhon version, PyTorch version and CUDA version. For example, for PyToch 1.12.1, python 3.7, and CUDA 11.3, you can install unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl.
Docker image
We also provide the docker image. you can pull it by docker pull dptechnology/unicore:0.0.1-pytorch1.11.0-cuda11.3
. To use GPUs within docker, you need to install nvidia-docker-2 first.
Example
To build a model, you can refer to example/bert.
Related projects
Acknowledgement
The main framework is from facebookresearch/fairseq.
The fused kernels are from guolinke/fused_ops.
Dockerfile is from guolinke/pytorch-docker.
License
This project is licensed under the terms of the MIT license. See LICENSE for additional details.