Home

Awesome

GLake: optimizing GPU memory management and IO transmission

中文

Latest News

Introduction

AI large model training and inference are increasingly confronted with the challenges of memory wall and IO transmission wall, that is, the growth of GPU memory capacity and IO bandwidth cannot keep up with the growth rate of AI model size.

To address these challenges, GLake is an acceleration library and relevant utilites that work at the bottom layer (GPU virtual and physical memory management) and system layer (multi-GPU, multi-path, and multi-tasking) to optimize GPU memory and IO.

GLake enables AI trainging, inference (including converting large models to TensorRT or ONNX Runtime on NVIDIA A10/3090) and DevOps (like Notebook) to fully utilize the underlying hardware resources, improving training throughput by up to 4 times, saving inference memory by up to 3 times, and accelerating IO transmission by 3~12 times.

To use GLake, the simplest way is to replace the underlying library (e.g., libcuda.so or PyTorch libc10_cuda.so), though more graceful way is to follow the detailed steps.

Motivation

Architecture

GLake is designed with a layered architecture. Currently tests and verfications focus on PyTorch and NVIDIA GPUs, we're working on more devices support:

<div align="center"> <img src="docs/figures/glake_arch_en.png" alt="Editor" width="700"> </div>

Features

Quick Results

  1. GLake reduces the memory fragmentation by up to 27%, save 25G of GPU memory, and increase the training throughput of a 10B model by up to nearly 4 times.
  2. For inference, GLake supports cross-process and cross-model elimination of duplicate memory, saving 3 times memory.
  3. GLake accelerates CPU-GPU IO transmission by 3 times.

Examples

GMLake tutorial
Multi-path tutorial

How it works

<div align="center"> <img src="docs/figures/gmlake.png" alt="Editor" width="500"> </div> <div align="center"> <img src="docs/figures/multi_path_view.png" alt="Editor" width="700"> </div> <div align="center"> <img src="docs/figures/dedup1.png" alt="Editor" width="500"> </div>

Roadmap

We are working on a few interesting featues. Any questions, suggestions and participations are welcomed.

Community

WeChat: TBD

Dingding: TBD