Home

Awesome

Triton Tutorials

For users experiencing the "Tensor in" & "Tensor out" approach to Deep Learning Inference, getting started with Triton can lead to many questions. The goal of this repository is to familiarize users with Triton's features and provide guides and examples to ease migration. For a feature by feature explanation, refer to the Triton Inference Server documentation.

Getting Started Checklist

Overview VideoConceptual Guide: Deploying Models

Quick Deploy

The focus of these examples is to demonstrate deployment for models trained with various frameworks. These are quick demonstrations made with an understanding that the user is somewhat familiar with Triton.

Deploy a ...

PyTorch ModelTensorFlow ModelONNX ModelTensorRT Accelerated ModelvLLM ModelOpenVINO Model

LLM Tutorials

The table below contains some popular models that are supported in our tutorials

Example ModelsTutorial Link
Llama-2-7BTensorRT-LLM Tutorial
Persimmon-8BHuggingFace Transformers Tutorial
Falcon-7BHuggingFace Transformers Tutorial
LLaVA-v1.5-7BTensorRT-LLM Tutorial

Note: This is not an exhausitive list of what Triton supports, just what is included in the tutorials.

What does this repository contain?

This repository contains the following resources:

Navigating Triton Inference Server Resources

The Triton Inference Server GitHub organization contains multiple repositories housing different features of the Triton Inference Server. The following is not a complete description of all the repositories, but just a simple guide to build intuitive understanding.

Adding Requests

Open an issue and specify details for adding a request for an example. Want to make a contribution? Open a pull request and tag an Admin.