Home

Awesome

Local Multimodal AI Chat

Getting Started

You can follow my YouTube-Video on setting up the repository on Linux or Windows.

To get started with Local Multimodal AI Chat, clone the repository and follow these simple steps:

Easiest and Preferred Method: Docker Compose

  1. Set model save path: Line 21 in the docker-compose.yml file

  2. Enter command in terminal: docker compose up

    Note: If you don't have a GPU, you can remove the deploy section from the docker compose file.

  3. Optional:

    • Check the config.yaml file and change accordingly to your needs.
    • Place your user_image.png and/or bot_image.png inside the chat_icons folder and remove the old ones.
  4. Open the app: Open 0.0.0.0:8501 in the Browser

  5. Pull Models: Go to https://ollama.com/library and choose the models you want to use. Enter /pull MODEL_NAME in the chat bar. You need one embedding model e.g. nomic-embed-text to embed pdf files (change embedding model in config if you choose another). You also need a model which undertands images e.g. llava

  6. Optional:

    • Check the config.yaml file and change accordingly to your needs.
    • Place your user_image.png and/or bot_image.png inside the chat_icons folder and remove the old ones.

Recommendation for Windows

Using ollama docker container results in very slow loading times for the models due to system calls being translated between two kernels. Installing Ollama locally works best here.

  1. Install Ollama desktop

  2. Change Docker Compose file: remove docker-compose.yml and rename docker-compose_without_ollama.yml to docker-compose.yml

  3. Change Ollama Base URL in config.yaml: Use line 4 in the config.yaml file and remove line 3

  4. Enter command in terminal: docker compose up

  5. Open the app: Open 0.0.0.0:8501 in the Browser

  6. Pull Models: Go to https://ollama.com/library and choose the models you want to use. Enter /pull MODEL_NAME in the chat bar. You need one embedding model e.g. nomic-embed-text to embed pdf files (change embedding model in config if you choose another). You also need a model which undertands images e.g. llava

  7. Optional:

    • Check the config.yaml file and change accordingly to your needs.
    • Place your user_image.png and/or bot_image.png inside the chat_icons folder and remove the old ones.

Complete Manual Install

  1. Install Ollama

  2. Create a Virtual Environment: I am using Python 3.10.12

  3. Install Requirements:

    • pip install --upgrade pip
    • pip install -r requirements.txt
    • pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
  4. Enter commands in terminal:

    1. python3 database_operations.py This will initialize the sqlite database for the chat sessions.
    2. streamlit run app.py
  5. Pull Models: Go to https://ollama.com/library and choose the models you want to use. Enter /pull MODEL_NAME in the chat bar. You need one embedding model e.g. nomic-embed-text to embed pdf files and one model which undertands images e.g. llava

  6. Optional:

    • Check the config.yaml file and change accordingly to your needs.
    • Place your user_image.png and/or bot_image.png inside the chat_icons folder and remove the old ones.

Overview

Local Multimodal AI Chat is a multimodal chat application that integrates various AI models to manage audio, images, and PDFs seamlessly within a single interface. This application is ideal for those passionate about AI and software development, offering a comprehensive solution that employs Whisper AI for audio processing, LLaVA for image management, and Chroma DB for handling PDFs.

The application has been enhanced with the Ollama server and the OpenAI API, boosting its functionality and performance. You can find a detailed tutorial on the development of this repository on my youtube channel. While significant advancements have been made, the project is still open to further development and refinement.

I welcome contributions of all forms. Whether you’re introducing new features, optimizing the code, or correcting bugs, your participation is valued. This project thrives on community collaboration and aims to serve as a robust resource for those interested in the practical application of multimodal AI technologies.

Features

Changelog

16.09.2024:

<details> <summary>Click to see more!</summary>

24.08.2024:

17.02.2024:

10.02.2024:

09.02.2024:

16.01.2024:

12.01.2024:

</details>

Possible Improvements

Contact Information

If you're interested in working with me, feel free to contact me via email. Before contacting me because of errors you're encountering, make sure to check the github issues first: https://github.com/Leon-Sander/Local-Multimodal-AI-Chat/issues?q=