Awesome

Awesome Local AI

If you tried Jan Desktop and liked it, please also check out the following awesome collection of open source and/or local AI tools and solutions.

Your contributions are always welcome!

Lists

awesome-local-llms - Table of open-source local LLM inference projects with their GitHub metrics.
llama-police - A list of Open Source LLM Tools from Chip Huyen

Inference Engine

Repository	Description	Supported model formats	CPU/GPU Support	UI	language	Platform Type
llama.cpp	- Inference of LLaMA model in pure C/C++	GGML/GGUF	Both	❌	C/C++	Text-Gen
Cortex	- Multi-engine engine embeddable in your apps. Uses llama.cpp and more	Both	Both	❌	Text-Gen
ollama	- CLI and local server. Uses llama.cpp	Both	Both	❌	Text-Gen
koboldcpp	- A simple one-file way to run various GGML models with KoboldAI's UI	GGML	Both	✅	C/C++	Text-Gen
LoLLMS	- Lord of Large Language Models Web User Interface.	Nearly ALL	Both	✅	Python	Text-Gen
ExLlama	- A more memory-efficient rewrite of the HF transformers implementation of Llama	AutoGPTQ/GPTQ	GPU	✅	Python/C++	Text-Gen
vLLM	- vLLM is a fast and easy-to-use library for LLM inference and serving.	GGML/GGUF	Both	❌	Python	Text-Gen
SGLang	- 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse)	Safetensor / AWQ / GPTQ	GPU	❌	Python	Text-Gen
LmDeploy	- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.	Pytorch / Turbomind	Both	❌	Python/C++	Text-Gen
Tensorrt-llm	- Inference efficiently on NVIDIA GPUs	Python / C++ runtimes	Both	❌	Python/C++	Text-Gen
CTransformers	- Python bindings for the Transformer models implemented in C/C++ using GGML library	GGML/GPTQ	Both	❌	C/C++	Text-Gen
llama-cpp-python	- Python bindings for llama.cpp	GGUF	Both	❌	Python	Text-Gen
llama2.rs	- A fast llama2 decoder in pure Rust	GPTQ	CPU	❌	Rust	Text-Gen
ExLlamaV2	- A fast inference library for running LLMs locally on modern consumer-class GPUs	GPTQ/EXL2	GPU	❌	Python/C++	Text-Gen
LoRAX	- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs	Safetensor / AWQ / GPTQ	GPU	❌	Python/Rust	Text-Gen
text-generation-inference	- Inference serving toolbox with optimized kernels for each LLM architecture	Safetensors / AWQ / GPTQ	Both	❌	Python/Rust	Text-Gen

Inference UI

oobabooga - A Gradio web UI for Large Language Models.
LM Studio - Discover, download, and run local LLMs.
LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
GPT4All - A free-to-use, locally running, privacy-aware chatbot.
LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
LLM as a Chatbot Service - LLM as a Chatbot Service.
FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs).
Automatic1111 - Stable Diffusion web UI.
ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
Wordflow - Run, share, and discover AI prompts in your browsers
petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
ChatUI - Open source codebase powering the HuggingChat app.
AI-Mask - Browser extension to provide model inference to web apps. Backed by web-llm and transformers.js
everything-rag - Interact with (virtually) any LLM on Hugging Face Hub with an asy-to-use, 100% local Gradio chatbot.
LmScript - UI for SGLang and Outlines
Taskyon - Vue3 based Chat UI, integratable in webpages. Focused on "local first" principle. Any OpenAI API compatible endpoint.
QA-Pilot - An interactive chat app that leverages Ollama(or openAI) models for rapid understanding and navigation of GitHub code repository or compressed file resources
HammerAI - Simple character-chat interface to run LLMs on Windows, Mac, and Linux. Uses Ollama under the hood and is offline, free to chat, and requires zero configuration.
GPTLocalhost - A local Word Add-in for you to use local LLM servers in Microsoft Word. Alternative to "Copilot in Word" and much more affordable.

Platforms / full solutions

H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform.
BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.
Predibase - Serverless LoRA Fine-Tuning and Serving for LLMs.

Developer tools

Jan Framework - At its core, Jan is a cross-platform, local-first and AI native application framework that can be used to build anything.
Pinecone - Long-Term Memory for AI.
PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
Datature - The All-in-One Platform to Build and Deploy Vision AI.
One AI - MAKING GENERATIVE AI BUSINESS-READY.
Gooey.AI - Create Your Own No Code AI Workflows.
Mixo.io - AI website builder.
Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
LangChain - A framework for developing applications powered by language models.
gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
LMQL - LMQL is a query language for large language models.
LlamaIndex - A data framework for building LLM applications over external data.
Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
trypromptly - Create AI Apps & Chatbots in Minutes.
BentoML - BentoML is the platform for software engineers to build AI products.
LiteLLM - Call all LLM APIs using the OpenAI format.
Tune Studio - Playground for software developers to finetune and deploy large language models.
Langfuse - Open-source LLM monitoring platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications. #opensource
Shell-Pilot - Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies
code-collator: Creates a single markdown file that describes your entire codebase to language models.

Agents

SuperAGI - Opensource AGI Infrastructure.
Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
AI Agents - AI Agent that Power Up Your Productivity.
AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.
Open Interpreter - Let language models run code. Have your agent write and execute code.
CrewAI - Cutting-edge framework for orchestrating role-playing, autonomous AI agents.

Training

FastChat - An open platform for training, serving, and evaluating large language models.
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
BMTrain - Efficient Training for Big Models.
Alpa - Alpa is a system for training and serving large-scale neural networks.
Megatron-LM - Ongoing research training transformer models at scale.
Ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models.
Nanotron - Minimalistic large language model 3D-parallelism training.
TRL - Language model alignment with reinforcement learning.
PEFT - Parameter efficient fine-tuning (LoRA, DoRA, model merger and more)

LLM Leaderboard

Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
AlpacaEval Leaderboard - An Automatic Evaluator for Instruction-following Language Models.
LLM-Leaderboard-streamlit - A joint community effort to create one central leaderboard for LLMs.
lmsys.org - Benchmarking LLMs in the Wild with Elo Ratings.

Research

Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Awesome

Awesome Local AI

Lists

Inference Engine

Inference UI

Platforms / full solutions

Developer tools

User Tools

Agents

Training

LLM Leaderboard

Research

Community