Home

Awesome

Habana Gaudi2 kernel for llm.c

This repository provides the TPC kernels for llm.c using Gaudi2.

Table Of Contents

TPC Kernels Overview

The Tensor Processor Core™ (TPC) is a fully programmable VLIW4 processor designed to execute non-linear deep learning operators. It is embedded in Habana’s Gaudi deep learning accelerator. Habana’s Gaudi SoC contains numerous TPC cores all operating in parallel, with each core running a single thread. The TPC is designed with very long instruction word (VLIW) architecture. It has a wide single instruction multiple data (SIMD) vector unit that support 2048-bit SIMD operations with data types such as float, bfloat16, INT16, INT32 and INT8. In each cycle, the TPC’s ALU (Arithmetic Logic Unit) can execute up to 64 floats/INT32 ops, or 128 INT16 ops, or 256 INT8 ops. TPC is designed for workloads that do not map to Matrix Multiplication Engine (MME). Those workloads or operators can be implemented using TPC kernels.

Install Habanatools For Ubuntu

To retrieve the package please visit Habana Vault, click Artifact, find habanatools and download the latest release package for Ubuntu 22.04. You can find different packages for different OS you used.

  sudo dpkg -i ./habanatools_1.16.0-526_amd64.deb

llm.c Example of Layer Norm

 /usr/bin/tpc-clang layernorm_fwd.c -c -x c++ -o layernorm_fwd.o

The output of the compilation session will be an elf file named ‘batch_norm_fwd_f32.o’ . To extract raw binary, from the elf, use the following command:

 objcopy -O binary --only-section=.text layernorm_fwd.o layernorm_fwd.bin 

Using cmake tool shown in the following template examples.

For other OS, please refer to the TPC Tools Installation Guide for more details. If you get error like can't find libTpcElfReader.so etc, make sure you add /usr/lib/habanatools path to LD_LIBRARY_PATH environment variable.

The template examples show users how to create and build the custom kernels, which can be used in Tensorflow (TF) and PyTorch (PT) custom ops later. This template example has organized in the following way, which contains TPC kennels(kernels/), Glue codes(src/) and Unit tests(tests/).

###Build TPC + Glue code using CMake for Gaudi2

Make sure your Habana tools are installed, check the /usr/bin/tpc-clang and Cmake are up-to-date version, you can download latest cmake via https://cmake.org/download/

Clone the repository

 git clone https://github.com/abhilash1910/llm.tpc.git

In the terminal, make sure you are in the project root directory, then create a directory called build

mkdir build
cd build

then run the following commands

cmake ..
make

After build, you can find libcustom_tpc_perf_lib.so in build/src directory, which is your custom kernel library. For more details about TPC kernel writing, please refer to the TPC User Guide for more information.