Awesome

FoldMark: Protecting Protein Generative Models with Watermarking

In the github repo, we apply FoldMark to FrameFlow as an example.

Installation

# Conda environment with dependencies.
conda env create -f foldmark.yml

# Activate environment
conda activate fm

# Manually need to install torch-scatter.
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package.
# Current directory should be FoldMark/
pip install -e .

Wandb

Our training relies on logging with wandb. Log in to Wandb and make an account. Authorize Wandb here.

Data

Download preprocessed SCOPe dataset (~280MB) hosted on dropbox: link.

Other datasets are also possible to train on using the data/process_pdb_files.py script. However, we currently do not support other datasets.

# Expand tar file.
tar -xvzf preprocessed_scope.tar.gz
rm preprocessed_scope.tar.gz

Your directory should now look like this

├── analysis
├── build
├── configs
├── data
├── experiments
├── media
├── models
├── openfold
├── preprocessed
└── weights

Pretrain

python -W ignore experiments/pretrain.py

Pretrain

python -W ignore experiments/finetune.py

Acknowledgements

We thank the opensource codes from WaDiff , AquaLoRA and openfold .

Reference

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}