Awesome
<p align="center"> <a href="https://radicalbit.ai/"> <img src="docs/static/img/radicalbit.svg" width="100"> </a> </p> <br />Radicalbit AI Monitoring
👋 Welcome!
The Radicalbit AI Monitoring Platform provides a comprehensive solution for monitoring your Machine Learning and Large Language models in production.
🤔 Why Monitor AI Models?
While models often perform well during development and validation, their effectiveness can degrade over time in production due to various factors like data shifts or concept drift. The Radicalbit AI Monitor platform helps you proactively identify and address potential performance issues.
🗝️ Key Functionalities
The platform provides extensive monitoring capabilities to ensure optimal performance of your AI models in production. It analyzes both your reference dataset (used for pre-production validation) and the current datasets, allowing you to control:
- Data Quality
- Model Quality
- Model Drift
🏗️ Repository Structure
This repository contains all the files and projects to run Radicalbit AI Monitoring Platform
🚀 Installation using Docker compose
This repository provides a Docker Compose file for running the platform locally with a K3s cluster. This setup allows you to deploy Spark jobs.
To run, simply:
docker compose up
If the UI is needed:
docker compose --profile ui up
In order to initialize the platform with demo models you can run:
docker compose --profile ui --profile init-data up
Once all containers are up & running, you can go to http://localhost:5173 to play with the app.
Interacting with K3s cluster
The compose file includes a k9s container that can be used to monitor the K3s cluster.
docker compose up k9s -d && docker attach radicalbit-ai-monitoring-k9s-1
Other tools
In order to connect and interact with the K3s cluster from the local machine (for example with Lens or kubectl
), it is necessary to create another file starting from ./docker/k3s_data/kubeconfig/kubeconfig.yaml
(that is automatically generated when the docker compose is up and running).
Copy the above file and modify https://k3s:6443
with https://127.0.0.1:6443
and use this new file to interact with the cluster from the local machine
Real AWS
In order to use a real AWS instead of MinIO it is necessary to modify the environment variables of the api container, putting real AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_REGION
and S3_BUCKET_NAME
and removing S3_ENDPOINT_URL
.
Teardown
To completely clean up the environment we can use docker compose
docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans
To remove everything including container images:
docker compose --profile ui --profile k9s --profile init-data down -v --remove-orphans --rmi all
Spark tuning
We use Spark jobs to calculate metrics: if you need to tune Spark configuration in order to optimize performance for large files or accelerate computations, please refer to the corresponding section of this README file.
📖 Documentation
You can find the following documentation:
- An extensive step-by-step guide to install the development/testing version of the platform, followed by all key concepts and a hands-on guide on how to use the GUI.
- A practical guide that walks users through monitoring an AI solution on the platform.
- A detailed explanation on the three main model sections.
- An exhaustive description of all classes implemented inside the Python SDK.
- A list of all available metrics and charts.
- A page related to the architecture of the platform.
- A community support page.
🤝 Community
Please join us on our Discord server, to discuss the platform, share ideas, and help shape its future! Get help from experts and fellow users.
📦 Functionalities & Roadmap
We've released a first few dashboards, covering Classification, both Binary and Multiclass, and Regression models for tabular data. Over the coming weeks, we will be adding the following functionalities to the platform:
-
Batch workloads
- Binary Classification (Tabular Data)
- Multiclass Classification (Tabular Data)
- Regression (Tabular Data)
- LLMs (Data Quality)
- LLMs (Model Quality)
- Computer Vision (Images)
- Clustering (Tabular Data)
-
Real-Time workloads
- Binary Classification
- Multiclass Classification
- Regression
- Computer Vision
- Clustering
We Value Your Privacy
We collect anonymous usage data to improve our software. This information helps us understand how the software is used and identify areas for improvement. No personally identifiable information is collected.
The first time you start using the platform you will be explicitly asked whether you prefer to opt-in or opt-out this anonymous usage data collection.