Home

Awesome

SwiftTransformer

SwiftTransformer is a tiny yet powerful implementation of the inference infrastructure for transformer model families. It aims at providing an easy-to-use framework for researchers to try on their ideas and iterate quickly. Yet it also supports popular features like model/pipeline parallelism, FlashAttention, Continuous Batching, PagedAttention and should works as a great foundation for researchers to build their prototype. Currently, DistServe and FastServe use SwiftTransformer as the execution backend.

It has the following advantages:

Build

NOTE: For users who want to run LLM inference off-the-shelf, please refer to other high-level LLM serving systems written in Python based on SwiftTransformer (like DistServe and FastServe). They all contain detailed documentation about environment setup.

If you want to build your own project on top of SwiftTransformer, please follow the following steps:

# setup and activate the conda environment
conda env create -f environment.yml && conda activate SwiftTransformer

# build SwiftTransformer
cmake -B build && cmake --build build -j$(nproc)

If everything works fine, you should see libst_pybinding.so under the SwiftTransformer/build/lib directory. You can load this dynamic library in your Python project.

Run

We provide a simple example to run the OPT-1.3B model. Again, if you want to run LLM inference off-the-shelf, please see DistServe and FastServe.

Testing

We provide various unit tests to test the correctness of components of the model. To run the test, please compile the project, and then execute bin/unittest_XXX in the build directory.

Development

Code Structure

Currently, the code is organized as follows:

src
├── csrc
│   ├── kernel
│   ├── layer
│   ├── model
│   ├── pybinding.cc
│   └── util
├── examples
│   ├── benchmark_all_input_same.cc
│   ├── CMakeLists.txt
│   ├── lib
│   └── run_gpt.cc
└── unittest
    ├── kernel
    ├── layer
    ├── model
    ├── unittest_torch_utils.h
    ├── unittest_utils.h
    └── util

The csrc folder contains the core implementation of the model, including every kernel, layer and model.

The unittest folder contains unit tests for the components in csrc. The kernel, layer, model, and util folders under the unittest folder contain the implementation of the corresponding components. For example, src/unittest/layer/attention.cc contains the unit test for the Attention layer, which is implemented in src/csrc/layer/attention.cc.

Note for vscode users: If you encounter #include errors detected. Please update your includePath., you may need to update include path in .vscode/c_cpp_properties.json.

Design Philosophy