Home

Awesome

Verba

The Golden RAGtriever - Community Edition ✨

Weaviate PyPi downloads Docker support Demo

Welcome to Verba: The Golden RAGtriever, an community-driven open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally with Ollama and Huggingface or through LLM providers such as Anthrophic, Cohere, and OpenAI. This project is built with and for the community, please be aware that it might not be maintained with the same urgency as other Weaviate production applications. Feel free to contribute to the project and help us make Verba even better! <3

pip install goldenverba

Demo of Verba

What Is Verba?

Verba is a fully-customizable personal assistant utilizing Retrieval Augmented Generation (RAG) for querying and interacting with your data, either locally or deployed via cloud. Resolve questions around your documents, cross-reference multiple data points or gain insights from existing knowledge bases. Verba combines state-of-the-art RAG techniques with Weaviate's context-aware database. Choose between different RAG frameworks, data types, chunking & retrieving techniques, and LLM providers based on your individual use-case.

Open Source Spirit

Weaviate is proud to offer this open-source project for the community. While we strive to address issues as fast as we can, please understand that it may not be maintained with the same rigor as production software. We welcome and encourage community contributions to help keep it running smoothly. Your support in fixing open issues quickly is greatly appreciated.

Watch our newest Verba video here:

VIDEO LINK

Feature Lists

🤖 Model SupportImplementedDescription
Ollama (e.g. Llama3)Local Embedding and Generation Models powered by Ollama
HuggingFace (e.g. MiniLMEmbedder)Local Embedding Models powered by HuggingFace
Cohere (e.g. Command R+)Embedding and Generation Models by Cohere
Anthrophic (e.g. Claude Sonnet)Embedding and Generation Models by Anthrophic
OpenAI (e.g. GPT4)Embedding and Generation Models by OpenAI
Groq (e.g. Llama3)Generation Models by Groq (LPU inference)
Upstage (e.g. Solar)Embedding and Generation Models by Upstage
🤖 Embedding SupportImplementedDescription
WeaviateEmbedding Models powered by Weaviate
OllamaLocal Embedding Models powered by Ollama
SentenceTransformersEmbedding Models powered by HuggingFace
CohereEmbedding Models by Cohere
VoyageAIEmbedding Models by VoyageAI
OpenAIEmbedding Models by OpenAI
UpstageEmbedding Models by Upstage
📁 Data SupportImplementedDescription
UnstructuredIOImport Data through Unstructured
FirecrawlScrape and Crawl URL through Firecrawl
UpstageDocumentParseParse Documents through Upstage Document AI
PDF IngestionImport PDF into Verba
GitHub & GitLabImport Files from Github and GitLab
CSV/XLSX IngestionImport Table Data into Verba
.DOCXImport .docx files
Multi-Modal (using AssemblyAI)Import and Transcribe Audio through AssemblyAI
✨ RAG FeaturesImplementedDescription
Hybrid SearchSemantic Search combined with Keyword Search
Autocomplete SuggestionVerba suggests autocompletion
FilteringApply Filters (e.g. documents, document types etc.) before performing RAG
Customizable MetadataFree control over Metadata
Async IngestionIngest data asynchronously to speed up the process
Advanced Queryingplanned ⏱️Task Delegation Based on LLM Evaluation
Rerankingplanned ⏱️Rerank results based on context for improved results
RAG Evaluationplanned ⏱️Interface for Evaluating RAG pipelines
Agentic RAGout of scope ❌Agentic RAG pipelines
Graph RAGout of scope ❌Graph-based RAG pipelines
🗡️ Chunking TechniquesImplementedDescription
TokenChunk by Token powered by spaCy
SentenceChunk by Sentence powered by spaCy
SemanticChunk and group by semantic sentence similarity
RecursiveRecursively chunk data based on rules
HTMLChunk HTML files
MarkdownChunk Markdown files
CodeChunk Code files
JSONChunk JSON files
🆒 Cool BonusImplementedDescription
Docker SupportVerba is deployable via Docker
Customizable FrontendVerba's frontend is fully-customizable via the frontend
Vector ViewerVisualize your data in 3D
Multi-User Collaborationout of scope ❌Multi-User Collaboration in Verba
🤝 RAG LibrariesImplementedDescription
LangChainImplement LangChain RAG pipelines
Haystackplanned ⏱️Implement Haystack RAG pipelines
LlamaIndexplanned ⏱️Implement LlamaIndex RAG pipelines

Something is missing? Feel free to create a new issue or discussion with your idea!

Showcase of Verba


Getting Started with Verba

You have three deployment options for Verba:

pip install goldenverba
git clone https://github.com/weaviate/Verba

pip install -e .

Prerequisites: If you're not using Docker, ensure that you have Python >=3.10.0,<3.13.0 installed on your system.

git clone https://github.com/weaviate/Verba

docker compose --env-file <your-env-file> up -d --build

If you're unfamiliar with Python and Virtual Environments, please read the python tutorial guidelines.

API Keys and Environment Variables

You can set all API keys in the Verba frontend, but to make your life easier, we can also prepare a .env file in which Verba will automatically look for the keys. Create a .env in the same directory you want to start Verba in. You can find an .env.example file in the goldenverba directory.

Make sure to only set environment variables you intend to use, environment variables with missing or incorrect values may lead to errors.

Below is a comprehensive list of the API keys and variables you may require:

Environment VariableValueDescription
WEAVIATE_URL_VERBAURL to your hosted Weaviate ClusterConnect to your WCS Cluster
WEAVIATE_API_KEY_VERBAAPI Credentials to your hosted Weaviate ClusterConnect to your WCS Cluster
ANTHROPIC_API_KEYYour Anthropic API KeyGet Access to Anthropic Models
OPENAI_API_KEYYour OpenAI KeyGet Access to OpenAI Models
OPENAI_BASE_URLURL to OpenAI instanceModels
COHERE_API_KEYYour API KeyGet Access to Cohere Models
GROQ_API_KEYYour Groq API KeyGet Access to Groq Models
OLLAMA_URLURL to your Ollama instance (e.g. http://localhost:11434 )Get Access to Ollama Models
UNSTRUCTURED_API_KEYYour API KeyGet Access to Unstructured Data Ingestion
UNSTRUCTURED_API_URLURL to Unstructured InstanceGet Access to Unstructured Data Ingestion
ASSEMBLYAI_API_KEYYour API KeyGet Access to AssemblyAI Data Ingestion
GITHUB_TOKENYour GitHub TokenGet Access to Data Ingestion via GitHub
GITLAB_TOKENYour GitLab TokenGet Access to Data Ingestion via GitLab
FIRECRAWL_API_KEYYour Firecrawl API KeyGet Access to Data Ingestion via Firecrawl
VOYAGE_API_KEYYour VoyageAI API KeyGet Access to Embedding Models via VoyageAI
EMBEDDING_SERVICE_URLURL to your Embedding Service InstanceGet Access to Embedding Models via Weaviate Embedding Service
EMBEDDING_SERVICE_KEYYour Embedding Service KeyGet Access to Embedding Models via Weaviate Embedding Service
UPSTAGE_API_KEYYour Upstage API KeyGet Access to Upstage Models
UPSTAGE_BASE_URLURL to Upstage instanceModels
DEFAULT_DEPLOYMENTLocal, Weaviate, Custom, DockerSet the default deployment mode

API Keys in Verba

Weaviate

Verba provides flexibility in connecting to Weaviate instances based on your needs. You have three options:

  1. Local Deployment: Use Weaviate Embedded which runs locally on your device (except Windows, choose the Docker/Cloud Deployment)
  2. Docker Deployment: Choose this option when you're running Verba's Dockerfile.
  3. Cloud Deployment: Use an existing Weaviate instance hosted on WCD to run Verba

💻 Weaviate Embedded Embedded Weaviate is a deployment model that runs a Weaviate instance from your application code rather than from a stand-alone Weaviate server installation. When you run Verba in Local Deployment, it will setup and manage Embedded Weaviate in the background. Please note that Weaviate Embedded is not supported on Windows and is in Experimental Mode which can bring unexpected errors. We recommend using the Docker Deployment or Cloud Deployment instead. You can read more about Weaviate Embedded here.

🌩️ Weaviate Cloud Deployment (WCD)

If you prefer a cloud-based solution, Weaviate Cloud (WCD) offers a scalable, managed environment. Learn how to set up a cloud cluster and get the API keys by following the Weaviate Cluster Setup Guide.

🐳 Docker Deployment Another local alternative is deploying Weaviate using Docker. For more details, follow the How to install Verba with Docker section.

Deployment in Verba

⚙️ Custom Weaviate Deployment

If you're hosting Weaviate yourself, you can use the Custom deployment option in Verba. This will allow you to specify the URL, PORT, and API key of your Weaviate instance.

Ollama

Verba supports Ollama models. Download and Install Ollama on your device (https://ollama.com/download). Make sure to install your preferred LLM using ollama run <model>.

Tested with llama3, llama3:70b and mistral. The bigger models generally perform better, but need more computational power.

Make sure Ollama Server runs in the background and that you don't ingest documents with different ollama models since their vector dimension can vary that will lead to errors

You can verify that by running the following command

ollama run llama3

Unstructured

Verba supports importing documents through Unstructured IO (e.g plain text, .pdf, .csv, and more). To use them you need the UNSTRUCTURED_API_KEY and UNSTRUCTURED_API_URL environment variable. You can get it from Unstructured

UNSTRUCTURED_API_URL is set to https://api.unstructuredapp.io/general/v0/general by default

AssemblyAI

Verba supports importing documents through AssemblyAI (audio files or audio from video files). To use them you need the ASSEMBLYAI_API_KEY environment variable. You can get it from AssemblyAI

OpenAI

Verba supports OpenAI Models such as Ada, GPT3, and GPT4. To use them, you need to specify the OPENAI_API_KEY environment variable. You can get it from OpenAI

You can also add a OPENAI_BASE_URL to use proxies such as LiteLLM (https://github.com/BerriAI/litellm)

OPENAI_BASE_URL=YOUR-OPENAI_BASE_URL

HuggingFace

If you want to use the HuggingFace Features, make sure to install the correct Verba package. It will install required packages to use the local embedding models. Please note that on startup, Verba will automatically download and install embedding models when used.

pip install goldenverba[huggingface]

or

pip install `.[huggingface]`

If you're using Docker, modify the Dockerfile accordingly

Groq

To use Groq LPUs as generation engine, you need to get an API key from Groq.

Although you can provide it in the graphical interface when Verba is up, it is recommended to specify it as GROQ_API_KEY environment variable before you launch the application.
It will allow you to choose the generation model in an up-to-date available models list.

How to deploy with pip

Python >=3.10.0

  1. (Very Important) Initialize a new Python Environment
python3 -m virtualenv venv
source venv/bin/activate
  1. Install Verba
pip install goldenverba
  1. Launch Verba
verba start

You can specify the --port and --host via flags

  1. Access Verba
Visit localhost:8000
  1. (Optional)Create .env file and add environment variables

How to build from Source

  1. Clone the Verba repos
git clone https://github.com/weaviate/Verba.git
  1. Initialize a new Python Environment
python3 -m virtualenv venv
source venv/bin/activate
  1. Install Verba
pip install -e .
  1. Launch Verba
verba start

You can specify the --port and --host via flags

  1. Access Verba
Visit localhost:8000
  1. (Optional) Create .env file and add environment variables

How to install Verba with Docker

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. To get started with deploying Verba using Docker, follow the steps below. If you need more detailed instructions on Docker usage, check out the Docker Curriculum.

You can use docker pull semitechnologies/verba to pull the latest Verba Docker Image.

If you want to build the image yourself, you can do so by cloning the Verba repository and running docker build -t verba . inside the Verba directory.

  1. Clone the Verba repos Ensure you have Git installed on your system. Then, open a terminal or command prompt and run the following command to clone the Verba repository:
git clone https://github.com/weaviate/Verba.git
  1. Set necessary environment variables Make sure to set your required environment variables in the .env file. You can read more about how to set them up in the API Keys Section

  2. Adjust the docker-compose file You can use the docker-compose.yml to add required environment variables under the verba service and can also adjust the Weaviate Docker settings to enable Authentification or change other settings of your database instance. You can read more about the Weaviate configuration in our docker-compose documentation. You can also uncomment the ollama service to use Ollama within the same docker compose.

Please make sure to only add environment variables that you really need.

  1. Deploy using Docker With Docker installed and the Verba repository cloned, navigate to the directory containing the Docker Compose file in your terminal or command prompt. Run the following command to start the Verba application in detached mode, which allows it to run in the background:

docker compose up -d


docker compose --env-file goldenverba/.env up -d --build

This command will download the necessary Docker images, create containers, and start Verba. Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.

  1. Access Verba

If you want your Docker Instance to install a specific version of Verba you can edit the Dockerfile and change the installation line.

RUN pip install -e '.'

Verba Walkthrough

Select your Deployment

The first screen you'll see is the deployment screen. Here you can select between Local, Docker, Weaviate Cloud, or Custom deployment. The Local deployment is using Weaviate Embedded under the hood, which initializes a Weaviate instance behind the scenes. The Docker deployment is using a separate Weaviate instance that is running inside the same Docker network. The Weaviate Cloud deployment is using a Weaviate instance that is hosted on Weaviate Cloud Services (WCS). The Custom deployment allows you to specify your own Weaviate instance URL, PORT, and API key.

You can skip this part by setting the DEFAULT_DEPLOYMENT environment variable to Local, Docker, Weaviate, or Custom.

Import Your Data

First thing you need to do is to add your data. You can do this by clicking on Import Data and selecting either Add Files, Add Directory, or Add URL tab. Here you can add all your files that you want to ingest. You can then configure every file individually by selecting the file and clicking on Overview or Configure tab. Demo of Verba

Query Your Data

With Data imported, you can use the Chat page to ask any related questions. You will receive relevant chunks that are semantically relevant to your question and an answer generated by your choosen model. You can configure the RAG pipeline under the Config tab.

Demo of Verba

Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Before contributing, please read the Contribution Guide. Visit our Weaviate Community Forum if you need any help!

Project Architecture

You can learn more about Verba's architecture and implementation in its technical documentation and frontend documentation. It's recommended to have a look at them before making any contributions.

Known Issues

FAQ