Home

Awesome

llm-inference-solutions

A collection of all available inference solutions for the LLMs

NameOrgDescription
vllmUC BerkeleyA high-throughput and memory-efficient inference and serving engine for LLMs
Text-Generation-InferenceHugginface🤗Large Language Model Text Generation Inference
llm-engineScaleAIScale LLM Engine public repository
DeepSpeedMicrosoftDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
OpenLLMBentoMLOperating LLMs in production
LLMDeployInternLM TeamLMDeploy is a toolkit for compressing, deploying, and serving LLM
FlexFlowCMU,Stanford,UCSDA distributed deep learning framework.
CTranslate2OpenNMTFast inference engine for Transformer models
Fastchatlm-sysAn open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Triton-Inference-ServerNvidiaThe Triton Inference Server provides an optimized cloud and edge inferencing solution.
Lepton.AIlepton.aiA Pythonic framework to simplify AI service building
ScaleLLMVectorchA high-performance inference system for large language models, designed for production environments
LoraxPredibaseServe 100s of Fine-Tuned LLMs in Production for the Cost of 1
TensorRT-LLMNvidiaTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
mistral.rsmistral.rsBlazingly fast LLM inference.
NanoFlowNanoFlowA throughput-oriented high-performance serving framework for LLMs
LMCacheLMCacheFast and Cost Efficient Inference
LitserveLighting.AILightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.