Awesome

ViP - Vision transformer with differential Privacy: A PyTorch Implementation

This is a PyTorch implementation of the paper:

ViP: A Differentially Private Foundation Model for Computer Vision [paper link]

Yaodong Yu (UC Berkeley, Meta AI), Maziar Sanjabi (Meta AI), Yi Ma (UC Berkeley), Kamalika Chaudhuri (Meta AI), and Chuan Guo (Meta AI).

Setup

This repo is a modification on the MAE repo. Installation and preparation follow that repo.
For differentially private training, we utilize opacus and functorch libraries.
To ensure our model is compatible with DP training, we use timm==0.6.12.

How to pre-train differentially private transformers (ViP) with self-supervised learning?

The below figure presents an overview of the pipeline for our proposed recipe for training DP foundation vision models -- ViP:

In Step 1, we first pre-train a MAE model on synthetic images with standard optimizers (e.g., SGD, AdamW). We denote this model by (Syn)-ViP. In Step 2, we use the MAE model pre-trained on synthetic images as initialization, and then apply differential private optimizers (e.g., DP-SGD, DP-AdamW) to train a ViP model that satisfies (ϵ, δ)-DP.

Differentially Private Pre-training ViP

The differentially private (DP) pre-training instruction is in PRETRAIN.md.

DP Pre-trained ViP checkpoints

The following table provides the pre-trained checkpoints used in the paper:

<table><tbody>   <th valign="bottom"></th> <th valign="bottom">ViP-Syn-Base (Encoder & Decoder)</th> <th valign="bottom">ViP-Base (ViT Encoder)</th>  <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/vip/ckpt-vip-syn-base.pth">download link</a></td> <td align="center"><a href="https://dl.fbaipublicfiles.com/vip/ckpt-vip-laion233m-base.pth">download link</a></td> </tr> </tbody></table>

To load the ViP-Syn-Base (MAE encoder & decoder), please refer to the main_pretrain_vip.py script.
To load the ViP-Base (MAE encoder), please refer to the main_linprobe.py script.

Evaluations of DP Pre-trained ViP using Linear Probing (LP) and Fine-tuning (FT)

For instructions on linear probing and fine-tuning, please refer to the EVAL_LP_FT.md.

Reference

For technical details and full experimental results, please check the paper. Please consider citing our work if you find it helpful to yours:

@Article{ViP2023,
  author  = {Yaodong Yu and Maziar Sanjabi and Yi Ma and Kamalika Chaudhuri and Chuan Guo},
  journal = {arXiv:2306.08842},
  title   = {ViP: A Differentially Private Foundation Model for Computer Vision},
  year    = {2023},
}

Code Acknowledgements

The majority of ViP-MAE is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Opacus is licensed under Apache 2.0, while Shaders21k is also licensed under CC-BY-NC. Note that due to the non-commercial nature of the CC-BY-NC license, this code is not ready for production use.