Home

Awesome

[!IMPORTANT] bigdl-llm has now become ipex-llm (see the migration guide here); you may find the original BigDL project here.


💫 Intel® LLM Library for PyTorch*

<p> <b>< English</b> | <a href='./README.zh-CN.md'>中文</a> > </p>

IPEX-LLM is an LLM acceleration library for Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max), NPU and CPU 1.

[!NOTE]

Latest Update 🔥

<details><summary>More updates</summary> <br/> </details>

ipex-llm Demo

See demos of running local LLMs on Intel Core Ultra iGPU, Intel Core Ultra NPU, single-card Arc GPU, or multi-card Arc GPUs using ipex-llm below.

<table width="100%"> <tr> <td align="center" colspan="1"><strong>Intel Core Ultra (Series 1) iGPU</strong></td> <td align="center" colspan="1"><strong>Intel Core Ultra (Series 2) NPU</strong></td> <td align="center" colspan="1"><strong>Intel Arc dGPU</strong></td> <td align="center" colspan="1"><strong>2-Card Intel Arc dGPUs</strong></td> </tr> <tr> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/mtl_mistral-7B_q4_k_m_ollama.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/mtl_mistral-7B_q4_k_m_ollama.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/npu_llama3.2-3B.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/npu_llama3.2-3B.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/arc_llama3-8B_fp8_textwebui.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/arc_llama3-8B_fp8_textwebui.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/2arc_qwen1.5-32B_fp6_fastchat.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/2arc_qwen1.5-32B_fp6_fastchat.gif" width=100%; /> </a> </td> </tr> <tr> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/ollama_quickstart.md">Ollama <br> (Mistral-7B Q4_K) </a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/npu_quickstart.md">HuggingFace <br> (Llama3.2-3B SYM_INT4)</a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/webui_quickstart.md">TextGeneration-WebUI <br> (Llama3-8B FP8) </a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/fastchat_quickstart.md">FastChat <br> (QWen1.5-32B FP6)</a> </td> </tr> </table> <!-- See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html), [*local RAG using LangChain-Chatchat*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html), [*llama.cpp*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html) and [*Ollama*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html) *(on either Intel Core Ultra laptop or Arc GPU)* with `ipex-llm` below. <table width="100%"> <tr> <td align="center" colspan="2"><strong>Intel Core Ultra Laptop</strong></td> <td align="center" colspan="2"><strong>Intel Arc GPU</strong></td> </tr> <tr> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319632616-895d56cd-e74b-4da1-b4d1-2157df341424.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDE4MjUsIm5iZiI6MTcxMjI0MTUyNSwicGF0aCI6Ii8xOTMxMDgyLzMxOTYzMjYxNi04OTVkNTZjZC1lNzRiLTRkYTEtYjRkMS0yMTU3ZGYzNDE0MjQubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQzODQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2JmYzkxYWFhMGYyN2MxYTkxOTI3MGQ2NTFkZDY4ZjFjYjg3NmZhY2VkMzVhZTU2OGEyYjhjNzI5YTFhOGNhNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Ga8mmCAO62DFCNzU1fdoyC_4MzqhDHzjZedzmi_2L-I" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319625142-68da379e-59c6-4308-88e8-c17e40baba7b.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDA2MzQsIm5iZiI6MTcxMjI0MDMzNCwicGF0aCI6Ii8xOTMxMDgyLzMxOTYyNTE0Mi02OGRhMzc5ZS01OWM2LTQzMDgtODhlOC1jMTdlNDBiYWJhN2IubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQxODU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzYwOWI4MmQxZjFhMjJlNGNhZTA3MGUyZDE4OTA0N2Q2YjQ4NTcwN2M2MTY1ODAwZmE3OTIzOWI0Y2U3YzYwNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.g0bYAj3J8IJci7pLzoJI6QDalyzXzMYtQkDY7aqZMc4" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319625685-ff13b099-bcda-48f1-b11b-05421e7d386d.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDA4MTcsIm5iZiI6MTcxMjI0MDUxNywicGF0aCI6Ii8xOTMxMDgyLzMxOTYyNTY4NS1mZjEzYjA5OS1iY2RhLTQ4ZjEtYjExYi0wNTQyMWU3ZDM4NmQubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQyMTU3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MWQ3MmEwZGRkNGVlY2RkNjAzMTliODM1NDEzODU3NWQ0ZGE4MjYyOGEyZjdkMjBiZjI0MjllYTU4ODQ4YzM0NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.OFxex8Yj6WyqJKMi6B1Q19KkmbYqYCg1rD49wUwxdXQ" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/325939544-2fc0ad5e-9ac7-4f95-b7b9-7885a8738443.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTQxMjYwODAsIm5iZiI6MTcxNDEyNTc4MCwicGF0aCI6Ii8xOTMxMDgyLzMyNTkzOTU0NC0yZmMwYWQ1ZS05YWM3LTRmOTUtYjdiOS03ODg1YTg3Mzg0NDMubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQyNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MjZUMTAwMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjZlZDE4YjFjZWJkMzQ4NmY3ZjNlMmRiYWUzMDYxMTI3YzcxYjRiYjgwNmE2NDliMjMwOTI0NWJhMDQ1NDY1YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.WfA2qwr8EP9W7a3oOYcKqaqsEKDlAkF254zbmn9dVv0" width=100% controls /> </td> </tr> <tr> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html">Text-Generation-WebUI</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html">Local RAG using LangChain-Chatchat</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html">llama.cpp</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html">Ollama</a> </td> </tr> </table> -->

ipex-llm Performance

See the Token Generation Speed on Intel Core Ultra and Intel Arc GPU below1 (and refer to [2][3][4] for more details).

<table width="100%"> <tr> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/MTL_perf.jpg" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/MTL_perf.jpg" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/Arc_perf.jpg" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/Arc_perf.jpg" width=100%; /> </a> </td> </tr> </table>

You may follow the Benchmarking Guide to run ipex-llm performance benchmark yourself.

Model Accuracy

Please see the Perplexity result below (tested on Wikitext dataset using the script here).

Perplexitysym_int4q4_kfp6fp8_e5m2fp8_e4m3fp16
Llama-2-7B-chat-hf6.3646.2186.0926.1806.0986.096
Mistral-7B-Instruct-v0.25.3655.3205.2705.2735.2465.244
Baichuan2-7B-chat6.7346.7276.5276.5396.4886.508
Qwen1.5-7B-chat8.8658.8168.5578.8468.5308.607
Llama-3.1-8B-Instruct6.7056.5666.3386.3836.3256.267
gemma-2-9b-it7.5417.4127.2697.3807.2687.270
Baichuan2-13B-Chat6.3136.1606.0706.1456.0866.031
Llama-2-13b-chat-hf5.4495.4225.3415.3845.3325.329
Qwen1.5-14B-Chat7.5297.5207.3677.5047.2977.334

ipex-llm Quickstart

Docker

Use

Applications

Install

Code Examples

API Doc

FAQ

Verified Models

Over 70 models have been optimized/verified on ipex-llm, including LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM and more; see the list below.

ModelCPU ExampleGPU ExampleNPU Example
LLaMAlink1, link2link
LLaMA 2link1, link2linkPython link, C++ link
LLaMA 3linklinkPython link, C++ link
LLaMA 3.1linklink
LLaMA 3.2linkPython link, C++ link
LLaMA 3.2-Visionlink
ChatGLMlink
ChatGLM2linklink
ChatGLM3linklink
GLM-4linklink
GLM-4Vlinklink
GLM-EdgelinkPython link
Mistrallinklink
Mixtrallinklink
Falconlinklink
MPTlinklink
Dolly-v1linklink
Dolly-v2linklink
Replit Codelinklink
RedPajamalink1, link2
Phoenixlink1, link2
StarCoderlink1, link2link
Baichuanlinklink
Baichuan2linklinkPython link
InternLMlinklink
InternVL2link
Qwenlinklink
Qwen1.5linklink
Qwen2linklinkPython link, C++ link
Qwen2.5linkPython link, C++ link
Qwen-VLlinklink
Qwen2-VLlink
Qwen2-Audiolink
Aquilalinklink
Aquila2linklink
MOSSlink
Whisperlinklink
Phi-1_5linklink
Flan-t5linklink
LLaVAlinklink
CodeLlamalinklink
Skyworklink
InternLM-XComposerlink
WizardCoder-Pythonlink
CodeShelllink
Fuyulink
Distil-Whisperlinklink
Yilinklink
BlueLMlinklink
Mambalinklink
SOLARlinklink
Phixtrallinklink
InternLM2linklink
RWKV4link
RWKV5link
Barklinklink
SpeechT5link
DeepSeek-MoElink
Ziya-Coding-34B-v1.0link
Phi-2linklink
Phi-3linklink
Phi-3-visionlinklink
Yuan2linklink
Gemmalinklink
Gemma2link
DeciLM-7Blinklink
Deepseeklinklink
StableLMlinklink
CodeGemmalinklink
Command-R/coherelinklink
CodeGeeX2linklink
MiniCPMlinklinkPython link, C++ link
MiniCPM3link
MiniCPM-Vlink
MiniCPM-V-2linklink
MiniCPM-Llama3-V-2_5linkPython link
MiniCPM-V-2_6linklinkPython link
StableDiffusionlink
Bce-Embedding-Base-V1Python link
Speech_Paraformer-LargePython link

Get Support

Footnotes

  1. Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. 2