Home

Awesome

mtad-gat

Implementation of MTAD-GAT: Multivariate Time-series Anomaly Detection via Graph Attention Network

https://arxiv.org/pdf/2009.02040.pdf

Notes:

Future experiments:

Training steps:

  1. Download ServerMachineDataset from https://github.com/NetManAIOps/OmniAnomaly Put ServerMachineDataset here at the root of the

  2. Prepare tfrecords train and test data

python prepare_data.py --files_path=ServerMachineDataset/train --tfrecords_file=gs://anomaly_detection/mtad_tf/data/train/{}.tfrecords python prepare_data.py --files_path=ServerMachineDatasettest --label_path=ServerMachineDataset/test_label --tfrecords_file=gs://anomaly_detection/mtad_tf/data/test/{}.tfrecords

  1. Decide the mode of the model. Train model for each machine dataset, 28 of them. Farecasting seems 30k steps works. RECONSTRUCTING and BOTH - it took 300k. Models sometimes deviate to NaN during backpropagation, so I just restart it.

See losses in samples folder. Use provuded jupyternotebook, but adjust field positions, paths.

for c in {1..100}; do echo $c; time python training.py --action=TRAIN --train_file=gs://anomaly_detection/mtad_gat/data/train/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-fore --run_mode=FORECASTING --num_train_steps=30000; done

for c in {1..100}; do echo $c; time python training.py --action=TRAIN --train_file=gs://anomaly_detection/mtad_gat/data/train/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-reco --run_mode=RECONSTRACTING --num_train_steps=300000; done

for c in {1..100}; do echo $c; time python training.py --action=TRAIN --train_file=gs://anomaly_detection/mtad_gat/data/train/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-both --run_mode=BOTH --num_train_steps=300000; done

  1. Produce scores file - inference_score.csv. Use this file to calculate calculate threshold using POT

See graphs in samples folder

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-fore --prediction_task=inference_score --run_mode=FORECASTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-fore --prediction_task=inference_score --run_mode=RECONSTRACTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_gat/output/machine-1-1-fore --prediction_task=inference_score --run_mode=BOTH

  1. Use my another repository EVT_POT to calculate threshold. inference_score.csv is an input for that

  2. Adjust and use jupyter notebook to grath inference_score.csv, initial threshold, POT and Best threshold. Adjust those values in the notebook

Anomaly Evaluation

I do not use Estimator EVAL to do this since I use an adjustment procedure. I use this same simple omnianomaly approach. If we predicted anomaly and it is within some anomaly segment, whole segment becomes correctly predicted. I do not use a prediction latency (delta) value as in some other papers. Also, I think practically, if we flagged anomaly earlier compare to actual anomaly label, it could be considered hit maybe using some delta as well. I did not do this way here. Please see samples folder for evaluation metrics for machine-1-1-{fore|reco|both}.results in samples folder

You must provide your own different threshold values

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-fore --threshold=1.5547085 --prediction_task=EVALUATE --run_mode=FORECASTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-reco --threshold=1.5547085 --prediction_task=EVALUATE --run_mode=RECONSTRACTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-both --threshold=1.5547085 --prediction_task=EVALUATE --run_mode=BOTH

Anomaly Prediction

This command creates Anomaly.csv in the local folder. You must provide your threshold values

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-fore --threshold=1.5547085 --run_mode=FORECASTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-fore --threshold=1.5547085 --run_mode=RECONSTRACTING

python training.py --action=PREDICT --test_file=gs://anomaly_detection/mtad_gat/data/test/machine-1-1.tfrecords --output_dir=gs://anomaly_detection/mtad_tf/output/machine-1-1-fore --threshold=1.5547085 --run_mode=BOTH