Awesome
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
Implementation for our paper, submitted to NeurIPS 2021 (also check this high-level blog post).
This is a minimum working version of the code used for the paper, which is extracted from the internal repository of the Mila Molecule Discovery project. Original commits are lost here, but the credit for this code goes to @bengioe, @MJ10 and @MKorablyov (see paper).
Note: for more modern implementations of GFlowNet, check out recursionpharma/gflownet, saleml/gfn, and alexhernandezgarcia/gflownet.
Grid experiments
Requirements for base experiments:
torch numpy scipy tqdm
Additional requirements for active learning experiments:
botorch gpytorch
Molecule experiments
Additional requirements:
pandas rdkit torch_geometric h5py ray
- a few biochemistry programs, see
mols/Programs/README
For rdkit
in particular we found it to be easier to install through (mini)conda, but rdkit-pypi
also works on pip
in a vanilla python virtual environment. torch_geometric
has non-trivial installation instructions.
If you have CUDA 10.1 configured, you can run pip install -r requirements.txt
. You can also change requirements.txt
to match your CUDA version. (Replace cu101 to cuXXX, where XXX is your CUDA version).
We compress the 300k molecule dataset for size. To uncompress it, run cd mols/data/; gunzip docked_mols.h5.gz
.