Home

Awesome

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Authors: Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath

Affliation: Argonne National Laboratory

Metrix of Evaluated Frameworks and Hardwares :

Framework/ HardwareNVIDIA A100NVIDIA H100NVIDIA GH200AMD MI250AMD MI300XIntel Max1550Habana Gaudi2Sambanova SN40L
vLLMYesYesYesYesYesYesNoN/A
llama.cppYesYesYesYesYesYesN/AN/A
TensorRT-LLMYesYesYesN/AN/AN/AN/AN/A
DeepSpeed-MIIYesNoNoNoNoNoYesN/A
SambaflowN/AN/AN/AN/AN/AN/AN/AYes

Key Insights

Cite this work:

@misc{chittyvenkata2024llminferencebenchinferencebenchmarkinglarge,
     title={LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators}, 
     author={Krishna Teja Chitty-Venkata and Siddhisanket Raskar and Bharat Kale and Farah Ferdaus and Aditya Tanikanti and Ken Raffenetti and Valerie Taylor and Murali Emani and Venkatram Vishwanath},
     year={2024},
     eprint={2411.00136},
     archivePrefix={arXiv},
     primaryClass={cs.LG},
     url={https://arxiv.org/abs/2411.00136}, 
}
Acknowledgements

This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.