Awesome
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Authors: Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath
Affliation: Argonne National Laboratory
Metrix of Evaluated Frameworks and Hardwares :
Framework/ Hardware | NVIDIA A100 | NVIDIA H100 | NVIDIA GH200 | AMD MI250 | AMD MI300X | Intel Max1550 | Habana Gaudi2 | Sambanova SN40L |
---|---|---|---|---|---|---|---|---|
vLLM | Yes | Yes | Yes | Yes | Yes | Yes | No | N/A |
llama.cpp | Yes | Yes | Yes | Yes | Yes | Yes | N/A | N/A |
TensorRT-LLM | Yes | Yes | Yes | N/A | N/A | N/A | N/A | N/A |
DeepSpeed-MII | Yes | No | No | No | No | No | Yes | N/A |
Sambaflow | N/A | N/A | N/A | N/A | N/A | N/A | N/A | Yes |
Key Insights
Cite this work:
@misc{chittyvenkata2024llminferencebenchinferencebenchmarkinglarge,
title={LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators},
author={Krishna Teja Chitty-Venkata and Siddhisanket Raskar and Bharat Kale and Farah Ferdaus and Aditya Tanikanti and Ken Raffenetti and Valerie Taylor and Murali Emani and Venkatram Vishwanath},
year={2024},
eprint={2411.00136},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.00136},
}
Acknowledgements
This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.