Awesome
Pythia
Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutations
Prerequisites
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt
Usage
To use the Pythia, you can run it from the command line with the following options:
Basic Usage
cd pythia
python masked_ddg_scan.py
By default, this will process files in the directory ../s669_AF_PDBs/
using cuda:0
(GPU 0) if available.
Command Line Options
-
--input_dir
: Specifies the directory path containing the PDB files. Default is../s669_AF_PDBs/
.Example:
python masked_ddg_scan.py --input_dir "/path/to/directory/"
-
--pdb_filename
: If you want to process a single PDB file instead of a directory, specify its path with this option.Example:
python masked_ddg_scan.py --pdb_filename "/path/to/file.pdb"
-
--check_plddt
: Use this flag if you want to filter PDB files based on their pLDDT value. Files with a pLDDT value less than the specified cutoff (see below) will be ignored.Example:
python masked_ddg_scan.py --check_plddt
-
--plddt_cutoff
: Specifies the pLDDT cutoff value if--check_plddt
is used. Default is 95.Example:
python masked_ddg_scan.py --check_plddt --plddt_cutoff 90
-
--n_jobs
: Indicates the number of parallel jobs to run. Default is 2.Example:
python masked_ddg_scan.py --n_jobs 4
-
--device
: Specifies the device to use for computation. By default, it will usecuda:0
(GPU 0). If you want to use CPU or another GPU, specify it here. Valid values includecuda:0
,cuda:1
, ... for GPUs, orcpu
for the CPU.Example:
python masked_ddg_scan.py --device cpu
Examples
-
Process all PDB files in the directory
/path/to/directory/
, using the first GPU and checking pLDDT values with a cutoff of 90:python masked_ddg_scan.py --input_dir "/path/to/directory/" --check_plddt --plddt_cutoff 90 --device cuda:0
-
Process a single PDB file
/path/to/file.pdb
using the CPU:python masked_ddg_scan.py --pdb_filename "/path/to/file.pdb" --device cpu
Megascale dataset, S2648, S669 contains predictions and labels.
Train
- Download preprocessed files for training at CATH dataset or BioA dataset from the Google Drive:
sbatch train_model.sh