Home

Awesome

continuous-eval examples

This repo contains end-to-end examples of GenAI/LLM applications and evaluation pipelines set up using continuous-eval.

Checkout continuous-eval repo and documentation for more information.

Examples

Example NameApp FrameworkEval FrameworkDescription
Simple RAGLangchaincontinuous-evalSimple QA chatbot over select Paul Graham essays
Complex RAGLangchaincontinuous-evalComplex QA chatbot over select Paul Graham essays
Simple ToolsLlamaIndexcontinuous-evalMath question solver using simple tools
Context Augmentation AgentLlamaIndexcontinuous-evalQA over Uber financial dataset using agents
Sentiment ClassificationLlamaIndexcontinuous-evalSingle label classification of sentence sentiment
Simple RAGHaystackcontinuous-evalSimple QA chatbot over select Paul Graham essays

Installation

In order to run the examples, you need to have Python 3.11 (suggested) and Poetry installed. Then, clone this repo and install the dependencies:

git clone https://github.com/relari-ai/examples.git && cd examples
poetry use 3.11
poetry install --with haystack --with langchain --with llama-index

Note that the --with flags are optional and only needed if you want to run the examples for the respective frameworks.

Get started

Each example is in a subfolder: examples/<FRAMEWORK>/<APP_NAME>/.

Some examples have just one script to execute (e.g. Haystack's Simple RAG), some have multiple:

Depending on the application, the source data for the application (documents and embeddings in Chroma vectorstore) and evaluation (golden dataset) is also provided. Note that for the evaluation golden dataset, there are always two files:

Tweak metrics and tests in pipeline.py to try out different metrics.