Home

Awesome

Benchmark networks of the AAAI 2021 paper Scalable Verification of Quantized Neural Networks

Link to the arXiv version

When using or adapting the code and benchmarks, please cite

@inproceedings{henzinger2021scalable,
  title={Scalable Verification of Quantized Neural Networks},
  author={Henzinger, Thomas A and Lechner, Mathias and {\v{Z}}ikeli{\'c}, {\DJ}or{\dj}e},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={5},
  pages={3787--3795},
  year={2021}
}

Setup

Installing boolector

git clone https://github.com/boolector/boolector
cd boolector
./contrib/setup-cadical.sh
./contrib/setup-btor2tools.sh
./configure.sh --python --py3
cd build
make

Running the Bit-vector SMT encodings on the benchmarks

# Checking MNIST test set sample 1 with epsilon=1
python3 check_mnist_robustness.py --sample_id 1 --eps 1
# Checking MNIST test set sample 115 with epsilon=2
python3 check_mnist_robustness.py --sample_id 115 --eps 2
# Checking MNIST test set sample 200 with epsilon=3
python3 check_mnist_robustness.py --sample_id 200 --eps 3 --dataset fashion-mnist

Using the benchmarks for other method

The file iterate_samples.py shows how to load and use the networks. It also iterates over the exact samples of the benchmark and output the corresponding epsilons to use.

python3 iterate_samples.py --dataset fashion-mnist

When implementing some verification method of the networks of these benchmarks, make sure they exactly match the bit-exact quantized rounding semantics in each layer. The files quantization_layers.py and quantization_utils.py should serve as a good starting point.

Benchmark description

The benchmark consists of robustness verification queries.

Benchmark 1MNIST
Network size784-64-10
Input quantization8-bit
Network quanization (weights and activations)6-bit
Samples400 (100 per epsilon value)
Epsilons (L-infinity norm)1,2,3,4
Weightsweights/mnist_mlp.h5
Benchmark 2Fashion-MNIST
Network size784-64-10
Input quantization8-bit
Network quanization (weights and activations)6-bit
Samples300 (50/100 per epsilon value)
Epsilons (L-infinity norm)1,2,3,4
Weightsweights/fashion-mnist_mlp.h5

Quantitative results from the paper

MNIST samples 0-99

TypeTest sample ID
Misclassified (1)18
Timeout (0)-
Vulnerable (0)-
Robust (99)Remaining

MNIST samples 100-199

TypeTest sample ID
Misclassified (1)149
Timeout (5)104, 119, 175, 193, 195
Vulnerable (3)115, 151, 158
Robust (91)Remaining

MNIST samples 200-299

TypeTest sample ID
Misclassified (4)217, 241, 247, 259
Timeout (25)204, 210, 211, 218, 221, 224, 227, 232, 233, 234, 235, 243, 244, 250, 251, 255, 257, 264, 266, 273, 274, 275, 289, 290, 299
Vulnerable (1)282
Robust (70)Remaining

MNIST samples 300-399

TypeTest sample ID
Misclassified (3)321, 340, 381
Timeout (43)300, 301, 303, 307, 308, 322, 324, 325, 326, 328, 329, 335, 336, 337, 339, 341, 344, 345, 349, 350, 352, 354, 357, 358, 359, 362, 366, 368, 370, 372, 373, 377, 376, 379, 383, 385, 386, 388, 389, 391, 393, 394, 397
Vulnerable (1)320
Robust (53)Remaining

Fashion-MNIST samples 0-99

TypeTest sample ID
Misclassified (13)12, 17, 23, 25, 26, 29, 40, 48, 51, 66, 68, 89, 98
Timeout (11)4, 21, 28, 42, 43, 44, 49, 57, 67, 73, 91
Vulnerable (0)-
Robust (76)Remaining

Fashion-MNIST samples 100-199

TypeTest sample ID
Misclassified (10)107, 127, 145, 147, 150, 153, 163, 183, 192, 193
Timeout (17)101, 117, 122, 129, 136, 149, 151, 157, 166, 170, 172, 181, 188, 191, 195, 197, 198
Vulnerable (1)135
Robust (72)Remaining

Fashion-MNIST samples 200-249

TypeTest sample ID
Misclassified (7)222, 226, 239, 241, 244, 247, 249
Timeout (16)203, 205, 202, 213, 217, 219, 221, 224, 228, 230, 235, 238, 243, 245, 246, 248
Vulnerable (1)227
Robust (26)Remaining

Fashion-MNIST samples 300-349

TypeTest sample ID
Misclassified (6)316, 322, 324, 325, 332, 344
Timeout (26)300, 301, 302, 304, 312, 313, 309, 308, 311, 315, 318, 319, 320, 321, 323, 326, 329, 331, 333, 334, 336, 337, 338, 341, 342, 348
Vulnerable (0)-
Robust (18)Remaining

Using other SMT solvers

Each benchmark SMT formula can be exported and then checked by a different SMT solver than boolector

For instance, CVC4 (note: this will not terminate)

python3 check_mnist_robustness.py --sample_id 115 --eps 2 --export
./cvc4-1.8-x86_64-linux-opt  --lang smt smt2/mnist_0115.smt2