Awesome
GFlowNets for Biological Sequence Design
This repo contains code for the paper Biological Sequence Design with GFlowNets.
The code has been extracted from an internal repository of the Mila Molecule Discovery project with some changes, so the hyperparameters might vary. Original commits are lost here, but the credit goes to @MJ10 and @bengioe. There are some stability issues in training for the GFP task with this repo.
Setup
The code has been tested with Python 3.7 with CUDA 10.2 and CUDNN 8.0.
- Install design-bench from our fork
MJ10/design-bench
. This fork only changes some dependencies and resolves some minor changes to make it compatible with our code. To install clone the repo and runpip install -e .
in the directory where the repo is cloned. - Instal the clamp-common-eval library from MJ10/clamp-gen-data. This library handles the loading of the AMP data as well as oracles. To install clone the repo and run
pip install -r requirements.txt && pip install -e .
in the directory where the repo is cloned. - Run
pip install -e requirements.txt
in this directory to install the remaining packages.
Running the code
run_amp.py
, run_gfp.py
, and run_tfbind.py
are the entry points for the experiments.
Example:
python run_tfbind.py --gen_do_explicit_Z 1 --acq_fn ucb --gen_num_iterations 2500 --gen_reward_exp 8 --gen_data_sample_per_step 8 --proxy_num_iterations 10000 --gen_Z_learning_rate 1e-1 --gen_learning_rate 1e-3
Please reach out to Moksh Jain, mokshjn00@gmail.com for any issues, comments, questions or suggestions.