Home

Awesome

<div align="center"> <h1>πŸ—£οΈ Large Language Model Course</h1> <p align="center"> 🐦 <a href="https://twitter.com/maximelabonne">Follow me on X</a> β€’ πŸ€— <a href="https://huggingface.co/mlabonne">Hugging Face</a> β€’ πŸ’» <a href="https://mlabonne.github.io/blog">Blog</a> β€’ πŸ“™ <a href="https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python">Hands-on GNN</a> </p> </div> <br/>

The LLM course is divided into three parts:

  1. 🧩 LLM Fundamentals covers essential knowledge about mathematics, Python, and neural networks.
  2. πŸ§‘β€πŸ”¬ The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
  3. πŸ‘· The LLM Engineer focuses on creating LLM-based applications and deploying them.

For an interactive version of this course, I created two LLM assistants that will answer questions and test your knowledge in a personalized way:

πŸ“ Notebooks

A list of notebooks and articles related to large language models.

Tools

NotebookDescriptionNotebook
🧐 LLM AutoEvalAutomatically evaluate your LLMs using RunPod<a href="https://colab.research.google.com/drive/1Igs3WZuXAIv9X0vwqiE90QlEPys8e8Oa?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
πŸ₯± LazyMergekitEasily merge models using MergeKit in one click.<a href="https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
🦎 LazyAxolotlFine-tune models in the cloud using Axolotl in one click.<a href="https://colab.research.google.com/drive/1TsDKNo2riwVmU55gjuBgB1AXVtRRfRHW?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
⚑ AutoQuantQuantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click.<a href="https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
🌳 Model Family TreeVisualize the family tree of merged models.<a href="https://colab.research.google.com/drive/1s2eQlolcI1VGgDhqWIANfkfKvcKrMyNr?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
πŸš€ ZeroSpaceAutomatically create a Gradio chat interface using a free ZeroGPU.<a href="https://colab.research.google.com/drive/1LcVUW5wsJTO2NGmozjji5CkC--646LgC"><img src="img/colab.svg" alt="Open In Colab"></a>

Fine-tuning

NotebookDescriptionArticleNotebook
Fine-tune Llama 2 with QLoRAStep-by-step guide to supervised fine-tune Llama 2 in Google Colab.Article<a href="https://colab.research.google.com/drive/1PEQyJO1-f6j0S_XJ8DV50NkpzasXkrzd?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Fine-tune CodeLlama using AxolotlEnd-to-end guide to the state-of-the-art tool for fine-tuning.Article<a href="https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Fine-tune Mistral-7b with QLoRASupervised fine-tune Mistral-7b in a free-tier Google Colab with TRL.<a href="https://colab.research.google.com/drive/1o_w0KastmEJNVwT5GoqMCciH-18ca5WS?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Fine-tune Mistral-7b with DPOBoost the performance of supervised fine-tuned models with DPO.Article<a href="https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Fine-tune Llama 3 with ORPOCheaper and faster fine-tuning in a single stage with ORPO.Article<a href="https://colab.research.google.com/drive/1eHNWg9gnaXErdAa8_mcvjMupbSS6rDvi"><img src="img/colab.svg" alt="Open In Colab"></a>
Fine-tune Llama 3.1 with UnslothUltra-efficient supervised fine-tuning in Google Colab.Article<a href="https://colab.research.google.com/drive/164cg_O7SV7G8kZr_JXqLd6VC7pd86-1Z?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>

Quantization

NotebookDescriptionArticleNotebook
Introduction to QuantizationLarge language model optimization using 8-bit quantization.Article<a href="https://colab.research.google.com/drive/1DPr4mUQ92Cc-xf4GgAaB6dFcFnWIvqYi?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
4-bit Quantization using GPTQQuantize your own open-source LLMs to run them on consumer hardware.Article<a href="https://colab.research.google.com/drive/1lSvVDaRgqQp_mWK_jC9gydz6_-y6Aq4A?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Quantization with GGUF and llama.cppQuantize Llama 2 models with llama.cpp and upload GGUF versions to the HF Hub.Article<a href="https://colab.research.google.com/drive/1pL8k7m04mgE5jo2NrjGi8atB0j_37aDD?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
ExLlamaV2: The Fastest Library to RunΒ LLMsQuantize and run EXL2Β models and upload them to the HF Hub.Article<a href="https://colab.research.google.com/drive/1yrq4XBlxiA0fALtMoT2dwiACVc77PHou?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>

Other

NotebookDescriptionArticleNotebook
Decoding Strategies in Large Language ModelsA guide to text generation from beam search to nucleus samplingArticle<a href="https://colab.research.google.com/drive/19CJlOS5lI29g-B3dziNn93Enez1yiHk2?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Improve ChatGPT with Knowledge GraphsAugment ChatGPT's answers with knowledge graphs.Article<a href="https://colab.research.google.com/drive/1mwhOSw9Y9bgEaIFKT4CLi0n18pXRM4cj?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Merge LLMs with MergeKitCreate your own models easily, no GPU required!Article<a href="https://colab.research.google.com/drive/1_JS7JKJAQozD48-LhYdegcuuZ2ddgXfr?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Create MoEs with MergeKitCombine multiple experts into a single frankenMoEArticle<a href="https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>
Uncensor any LLM with abliterationFine-tuning without retrainingArticle<a href="https://colab.research.google.com/drive/1VYm3hOcvCpbGiqKZb141gJwjdmmCcVpR?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a>

🧩 LLM Fundamentals

This section introduces essential knowledge about mathematics, Python, and neural networks. You might not want to start here but refer to it as needed.

<details> <summary>Toggle section</summary>

1. Mathematics for Machine Learning

Before mastering machine learning, it is important to understand the fundamental mathematical concepts that power these algorithms.

πŸ“š Resources:


2. Python for Machine Learning

Python is a powerful and flexible programming language that's particularly good for machine learning, thanks to its readability, consistency, and robust ecosystem of data science libraries.

πŸ“š Resources:


3. Neural Networks

Neural networks are a fundamental part of many machine learning models, particularly in the realm of deep learning. To utilize them effectively, a comprehensive understanding of their design and mechanics is essential.

πŸ“š Resources:


4. Natural Language Processing (NLP)

NLP is a fascinating branch of artificial intelligence that bridges the gap between human language and machine understanding. From simple text processing to understanding linguistic nuances, NLP plays a crucial role in many applications like translation, sentiment analysis, chatbots, and much more.

πŸ“š Resources:

</details>

πŸ§‘β€πŸ”¬ The LLM Scientist

This section of the course focuses on learning how to build the best possible LLMs using the latest techniques.

1. The LLM architecture

While an in-depth knowledge about the Transformer architecture is not required, it is important to have a good understanding of its inputs (tokens) and outputs (logits). The vanilla attention mechanism is another crucial component to master, as improved versions of it are introduced later on.

πŸ“š References:


2. Building an instruction dataset

While it's easy to find raw data from Wikipedia and other websites, it's difficult to collect pairs of instructions and answers in the wild. Like in traditional machine learning, the quality of the dataset will directly influence the quality of the model, which is why it might be the most important component in the fine-tuning process.

πŸ“š References:


3. Pre-training models

Pre-training is a very long and costly process, which is why this is not the focus of this course. It's good to have some level of understanding of what happens during pre-training, but hands-on experience is not required.

πŸ“š References:


4. Supervised Fine-Tuning

Pre-trained models are only trained on a next-token prediction task, which is why they're not helpful assistants. SFT allows you to tweak them to respond to instructions. Moreover, it allows you to fine-tune your model on any data (private, not seen by GPT-4, etc.) and use it without having to pay for an API like OpenAI's.

πŸ“š References:


5. Preference Alignment

After supervised fine-tuning, RLHF is a step used to align the LLM's answers with human expectations. The idea is to learn preferences from human (or artificial) feedback, which can be used to reduce biases, censor models, or make them act in a more useful way. It is more complex than SFT and often seen as optional.

πŸ“š References:


6. Evaluation

Evaluating LLMs is an undervalued part of the pipeline, which is time-consuming and moderately reliable. Your downstream task should dictate what you want to evaluate, but always remember Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."

πŸ“š References:


7. Quantization

Quantization is the process of converting the weights (and activations) of a model using a lower precision. For example, weights stored using 16 bits can be converted into a 4-bit representation. This technique has become increasingly important to reduce the computational and memory costs associated with LLMs.

πŸ“š References:


8. New Trends

πŸ“š References:

πŸ‘· The LLM Engineer

This section of the course focuses on learning how to build LLM-powered applications that can be used in production, with a focus on augmenting models and deploying them.

1. Running LLMs

Running LLMs can be difficult due to high hardware requirements. Depending on your use case, you might want to simply consume a model through an API (like GPT-4) or run it locally. In any case, additional prompting and guidance techniques can improve and constrain the output for your applications.

πŸ“š References:


2. Building a Vector Storage

Creating a vector storage is the first step to build a Retrieval Augmented Generation (RAG) pipeline. Documents are loaded, split, and relevant chunks are used to produce vector representations (embeddings) that are stored for future use during inference.

πŸ“š References:


3. Retrieval Augmented Generation

With RAG, LLMs retrieves contextual documents from a database to improve the accuracy of their answers. RAG is a popular way of augmenting the model's knowledge without any fine-tuning.

πŸ“š References:


4. Advanced RAG

Real-life applications can require complex pipelines, including SQL or graph databases, as well as automatically selecting relevant tools and APIs. These advanced techniques can improve a baseline solution and provide additional features.

πŸ“š References:


5. Inference optimization

Text generation is a costly process that requires expensive hardware. In addition to quantization, various techniques have been proposed to maximize throughput and reduce inference costs.

πŸ“š References:


6. Deploying LLMs

Deploying LLMs at scale is an engineering feat that can require multiple clusters of GPUs. In other scenarios, demos and local apps can be achieved with a much lower complexity.

πŸ“š References:


7. Securing LLMs

In addition to traditional security problems associated with software, LLMs have unique weaknesses due to the way they are trained and prompted.

πŸ“š References:


Acknowledgements

This roadmap was inspired by the excellent DevOps Roadmap from Milan Milanović and Romano Roth.

Special thanks to:

Disclaimer: I am not affiliated with any sources listed here.


<p align="center"> <a href="https://star-history.com/#mlabonne/llm-course&Date"> <img src="https://api.star-history.com/svg?repos=mlabonne/llm-course&type=Date" alt="Star History Chart"> </a> </p>