Awesome

Materials for learning SGLang

Blog

LMSYS Org

[2024-09-04] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision

[2024-07-25] Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM)

[2024-02-05] Fast JSON Decoding for Local LLMs with Compressed Finite State Machine

[2024-01-17] Fast and Expressive LLM Inference with RadixAttention and SGLang

AMD

[2024-11-13] SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs

Slides

GPU MODE

[2024-11-10] SGLang Performance Optimization

The first LMSYS online meetup: Efficient LLM Deployment and Serving

[2024-10-16] SGLang Overview & CPU Overhead Hiding

[2024-10-16] Faster Constrained Decoding

[2024-10-16] SGLang DeepSeek MLA

[2024-10-16] Universal LLM deployment and low-latency serving in MLC LLM

[2024-10-16] XGrammar: Flexible And Efficient Structured Generation Engine for Large Language Models

[2024-10-16] Review of the first LMSYS online meetup: Efficient LLM Deployment and Serving

AMD Advancing AI 2024

[2024-10-10] Efficient LLM Inference with SGLang

SGLang Biweekly Meeting

[2024-11-16] SGLang Router and Side-Channel KV Cache Attack

[2024-11-02] Quantization on AMD

[2024-10-05] SGLang Double Sparsity

[2024-09-21] SGLang DeepSeek MLA

Other

SGLang v0.2: Faster Interface and Runtime for LLM Inference

Videos

Welcome to follow our YouTube channel.

GPU MODE

[2024-11-10] SGLang Performance Optimization

The first LMSYS online meetup

[2024-10-16] The First SGLang Online Meetup

SGLang Biweekly Meeting

[2024-11-16] SGLang Developer Sync 20241116

[2024-11-03] SGLang Developer Sync 20241103

[2024-10-19] SGLang Developer Sync 20241019

[2024-10-05] SGLang Developer Sync 20241005

[2024-09-21] SGLang Developer Sync 20240921

Paper

[NeurIPS 24] SGLang: Efficient Execution of Structured Language Model Programs

Documentaion

SGLang Documentation