Home

Awesome

🚀 RAG on Windows using TensorRT-LLM and LlamaIndex 🦙

<p align="center"> <img src="https://gitlab-master.nvidia.com/winai/trt-llm-rag-windows/-/raw/main/media/rag-demo.gif" align="center"> </p>

ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, photos. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. This app also lets you give query through your voice. As it all runs locally on your Windows RTX PC, you’ll get fast and secure results. ChatRTX supports various file formats, including text, pdf, doc/docx, xml, png, jpg, bmp. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds.

The AI models that are supported in this app:

The pipeline incorporates the above AI models, TensorRT-LLM, LlamaIndex and the FAISS vector search library. In the sample application here, we have a dataset consisting of recent articles sourced from NVIDIA Gefore News.

What is RAG? 🔍

Retrieval-augmented generation (RAG) for large language models (LLMs) that seeks to enhance prediction accuracy by connecting the LLM to your data during inference. This approach constructs a comprehensive prompt enriched with context, historical data, and recent or relevant knowledge.

Repository details

Getting Started

Hardware requirement

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.