Home

Awesome

MLWatcher

MLWatcher is a python agent that records a large variety of time-serie metrics of your running ML classification algorithm.
It enables you to monitor in real time :

The statistics derived from the data are :

Some additional data are derived from the predict_proba_matrix and monitored as continuous values :

MLWatcher minimal input is the predict_proba_matrix of your algorithm (for each line in the batch of data, the probabilities of each class). label_matrix and input_matrix are optional to monitor. In case of binary classification with only 2 classes, a threshold value can be fed to monitor the labels-related and prediction-related metrics.

MLWatcher use cases

Monitoring your Machine Learning metrics can be used to achieve multiple goals:

Example of concept drift for MNIST dataset where the input pixels values get suddenly inverted. An anomaly in the distribution of the features is raised:

Alt text

Example of how the model predictions metrics change when a new set of input data comes into production:

Alt text

Example of putting into production a weakly trained model (trained with a highly unbalanced training set) and how this affects the stability of the predictions distribution for production:

Alt text

Example of monitoring the accuracy metric for multiple concurrent algorithms:

Alt text

The size of each buffer of data is also monitored, so it is important to also correlate the computed metrics with the sample size. (ie : the sample size is not always statiscally significant).

Getting Started

0- Install the libs in requirements.txt.

python -m pip install -r /path/to/requirements.txt

1- Add the MLWatcher folder in the same folder of your algorithm script.

2- Personalize some technical parameters in file conf.py (rotating logs specs, filenames, token if applicable, etc).

3- Load the MLWatcher libs in your import lines :

from MLWatcher.agent import MonitoringAgent

4- Instanciate a MonitoringAgent object, and run the agent-server side:

agent = MonitoringAgent(frequency=5, max_buffer_size=500, n_classes=10, agent_id='1', server_IP='127.0.0.1', server_port=8000)

frequency : (int) Time in minutes to collect data. Frequency of monitoring
max_buffer_size : (int) Upper limit of number of inputs in buffer. Sampling of incoming data is done if limit is reached n_classes : (int) Number of classes for classification. Must be equal to the number of columns of your predict_proba matrix
agent_id : (string) ID. Used in case of multiple agent monitors (default '1')
server_IP : (string) IP of the server ('127.0.0.1' if local server)
server_port : (int) Port of the server (default 8000)

For LOCAL Server. Local server would be listening on previously defined port, on localhost interface (127.0.0.1).

agent.run_local_server()

For DISTANT Server : Hosted server would be listening on a defined port, on localhost interface (--listen localhost) or all interfaces (--listen all). Recommended :

python /path/to/server.py --listen all --port 8000 --n_sockets 5

See --help for server.py options.

5- Monitor the running ML process for each batch of data

agent.collect_data(
predict_proba_matrix = <your pred_proba matrix>,   ##mandatory
input_matrix = <your feature matrix>,  ##optional
label_matrix = <your label matrix>   ##optional
)

6- If TOKEN=None is provided in conf.py, you can analyze your data stored locally in the PROD folder with the given jupyter notebook (ANALYTICS folder)

7- For advanced analytics of the metrics and detect anomalies in your data, the agent output is compatible with Anodot Rest API by using a valid TOKEN.

You can use the Anodot API script as follows :

python anodot_api.py --input <path/to/PROD/XXX_MLmetrics.json> --token <TOKEN>

Prerequisites

The agent is fully writen in Python 3.X. It was tested with Python >= 3.5

The input format for the agent collector are :
predictions (mandatory): predict_proba_matrix size (batch_size x n_classes)
labels (optional): label_matrix binary matrix of shape (batch_size x n_classes) or (int matrix of shape (batch_size x 1)
features (optional) : input_matrix size (batch_size x n_features)
n_classes must be >= 2

Installing

See Getting Started section. You can also have a look and run the example given with the MNIST dataset in the EXAMPLE folder (requirement:tensorflow).

Deployment and technical features.

The agent structure is as follows:

It skips the data if a problem is met and records logs accordingly. The agent is a light weight collector that stores up to max_buffer_size datapoints every period. Above this limit, sampling is done using a 'Reservoir sampling' algorithm so the sampled data remains statistically significant.

To tackle bottleneck issues, you can adjust the number of threads that the server can run in parallel with the volume of batches you want to monitor synchronously. You can also adjust max_buffer_size and frequency parameters accordingly to your volumetry. For Anodot usage, a limit from Anodot API is defined as 2000 metric-datapoints per second. Please make sure that the volumetry is below this limit, else some monitored data would be lost (no storage case). Before going to production, a phase of tests for implementing the agent and server to your production running algorithm is highly recommended.

Contributing

This agent was developped by Anodot to help the data science community to monitor in real time the performance, the anomalies and the lifecycle of running ML algorithms.
Please also refer to the paper of Google 'The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction' to have a global view of the good practices in production ML algorithm design and monitoring.

Versioning

v1.0

Authors

License

MIT License

Copyright (c) 2019 Anodot

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments

Anodot Team
Glenda