


Code repository from the publication "Prediction of designer-recombinases for DNA editing with generative deep learning"

RecGen is a conditional variational autoencoder for the generation of tyrosine site-specific recombinases selective for the defined DNA target site. The repository contains the code that was used to train the RecGen models.

You can find the publication here and the recombinase sequences here


Example Data:


The application has been tested on Arch Linux v5.16.5.arch1-1 with Python 3.9.9, pytorch-gpu 1.10.1, pandas 1.4.0, numpy 1.22.1. To train the models a Nvidia Geforce RTX 3060 was used.


I recommend installing pytorch over conda, which shouldn't take more than a couple of minutes:

conda create -n "pytorch" python=3.9
conda activate pytorch
conda install -c conda-forge pytorch-gpu
conda install -c anaconda pandas
conda install -c anaconda numpy

To download the repository for use:

git clone https://github.com/ltschmitt/RecGen

Usage Demo:

Leave-one-out cross-validation:

python vae_train_loocv.py -i example_input/training_data_masked.csv

Expected output in output_loocv/:

Prediction of novel recombinases:

python vae_train_save.py -i example_input/training_data_masked.csv
python vae_load_predict.py -m saved_models -t example_input/predict_ts.csv -d example_input/training_data_masked.csv 

Expected output in saved_models/:

Expected output in output_prediction/:

All of these processes are not very demanding, so they should be done within a few minutes.

Further Usage:

In case you want to test the application with custom data I recommend to use the --help flag on the scripts to learn about how the parameters can be adapted for your needs.