Awesome
Adversarial Frontier Stitching
This is an implemention of "Adversarial Frontier Stitching for Remote Neural Network Watermarking" by Erwan Le Merrer, Patrick Perez and Gilles Trédan in TensorFlow.
What is adversarial frontier stitching?
Adversarial frontier stitching is an algorithm to inject a watermark into a pretrained neural network. It works by first generating a set of data points, also called the key set which will act as our watermark. It does that by applying a transformation, using the "fast gradient sign" method, to correctly classified samples. If the transformed inputs are still correctly classified we call them false adversaries and if they are now incorrectly classified we call them true adversaries. The combination of true and false adversaries is called the key. Next we train our pretrained model on the key until the true adversaries are correctly classified again. Our model is now watermarked. If the accuracy of the key is above a predefined arbitrary threshold we verfied that the model was watermarked by us.
How to use
A simple example can be found at example.ipynb or example.py.
- Call gen_adversaries(model, l, dataset, eps) in order to generate your true and false adversary sets, which will act as your watermark, where:
- model is your pretrained model.
- l is the length of the generated datasets - the true and false adversary sets will each have a length of l / 2.
- dataset is the TensorFlow dataset used for training.
- eps is the strength of the modification on the training set in order to generate the adversaries. It is used in the "fast gradient sign" method.
- Train your model on the concatenation of the training dataset and the true and false adversaries until the true adversaries are properly predicted again. Afterwards the model is watermarked.
- Use verify(model, key_set, threshold=0.05) on a model in order to test wether the model was watermarked by us, where:
- model is the model to test.
- key set is a TensorFlow dataset containing the concatenation of the true and false adversary sets.
- threshold is the p-value - it is a predefined hyperparameter in the range of zero to one which roughly controls the number of correct predictions on the key_set the model needs in order to be watermarked by us. A lower epsilon gives more certainty to the prediction that the model was watermarked by us, but makes it also more easy for third parties to remove the watermark. Defaults to 0.05 which was used in the paper.
Contribute
Show your support by ⭐ the project. Pull requests are always welcome.
License
This project is licensed under the MIT License - see the LICENSE.md file for details.