Awesome

Clairvoyante-pt

Pytorch version of Clairvoyante.

The main file is clairvoyante/clairvoyante_v3_pytorch.py which contains the code for the Pytorch model. It has the exact same APIs as the tensorflow Clairvoyante model in https://github.com/aquaskyline/Clairvoyante/blob/rbDev/clairvoyante/clairvoyante_v3.py.

The code initialises Clairvoyante with 3 convolutional layers, 2 hidden fully connected layers and 4 output layers. It specifies the parameters for these layers and it initialises the network's weights using He initializtion.

Pytorch uses NCHW format for tensor dimensions so all tensors require permutation in order to be used by the code.

Dependencies

Install this on top of the dependencies and folders listed in https://github.com/aquaskyline/Clairvoyante:

pip install torch torchvision

How to use the module

Initialise the model in the run function in train.py and callVar.py using

{module name}.Net()

Add

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
if torch.cuda.device_count() > 0:
    m.to(device)

to the run function in train.py and callVar.py after initialising the model to use one or more GPUs.

GPU

Use the CUDA_VISIBLE_DEVICE environment variable to specify the GPUs to use. This can be done using the command export CUDA_VISIBLE_DEVICES="$i", where $i is an integer from 0 identifying the seqeunce of the GPU to be used. The code supports GPU parallelism. If no GPUs are specified, the CPU is used instead.

Folder Stucture and Program Descriptions

`clairvoyante/`	Contains the Pytorch Model
`clairvoyante_v3_pytorch.py`	Pytorch Model of Clairvoyante.
`clairvoyante_v3_pytorch_test.py`	Unit test cases to test Pytorch model's loss function.

`correctVCFs/`	Contains the VCFs produced by TF Clairvoyante and training and testing data sets
`basic_luo_chr21.vcf`	VCF produced by CallVAr using model produced by `demoRun.py`.
`correct_21.vcf`	`chr21.vcf` in the testingData folder.
`luo_bam_21.vcf`	VCF produced by CallVarBam using fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e-3.epoch500.
`luo_tensor_can_21.vcf`	VCF produced by CallVar using fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500.
`ngmlr1_chr19.vcf`	VCF produced by CallVarBam using fullv3-ont-ngmlr-hg001-hg19.

`evalResults/`	Each folder contains a results for a different `vcf-eval`. The results are at `summary/summary.txt` in each folder.
`TrainBamCPU_chr21/`	Comparison between VCFs made by `train.py` and `CallVarBam.py` and `correct_21.vcf`. (Used in presentation)
`basicLuo_correct/`	Comparison between VCFs made by `train.py` and `correct_21.vcf`. (Used in presentation)
`correct_bam/`	Comparison between VCFs made by `fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500` using `CallVarBam.py` and `correct_21.vcf`.
`luo_correct/`	Comparison between VCFs made by `fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500` using `CallVar.py` and `correct_21.vcf`.
`ngmlr1_chr19/`	Comparison between VCFs made by `fullv3-ont-ngmlr-hg001-hg19` using `CallVarBam.py` and `/nas7/yswong/base/hg19_chr19.vcf.gz`. (Used in presentation)
`trainAll2_chr19/`	Second comparison betwen VCFs produced by CallVarBam using fullv3-ont-ngmlr-hg001-hg19 and `/nas7/yswong/base/hg19_chr19.vcf.gz` using the GTX 980. (Used in presentation)
`trainAll3_chr19/`	Comparison betwen VCFs produced by `CallVarBam.py` using `fullv3-ont-ngmlr-hg001-hg19` and `/nas7/yswong/base/hg19_chr19.vcf.gz` using the GTX Titan and GTX 1080 Ti with a training batch size of 5000.
`trainAll4_chr19/`	Comparison betwen VCFs produced by `CallVarBam.py` using `fullv3-ont-ngmlr-hg001-hg19` and `/nas7/yswong/base/hg19_chr19.vcf.gz` using the GTX Titan and GTX 1080 Ti with a training batch size of 10000.
`trainAll_correct/`	Comparison betwen VCFs produced by `CallVarBam.py` using `fullv3-ont-ngmlr-hg001-hg19` and `/nas7/yswong/base/hg19_chr19.vcf.gz` using the GTX 980.

`pytorchModels/`	Each folder is a training experiment. Each folder contains the output of each training and some also contains the model parameters stored in a txt file. All models uses `/nas7/yswong/trainingData/tensor_all.bin` to train.
`trainAll/`	Model produced by training using the GTX 980.
`trainAll2/`	Model produced after training a second time using the GTX 980.
`trainAll3_5000PGPU/`	Model produced after training using the GTX 1080 Ti and GTX Titan using a training batch size of 5000.
`trainAll4_10000PGPU/`	Model produced after training using the GTX 1080 Ti and GTX Titan using a training batch size of 10000.
`trainAll5_1080Ti/`	Output produced after training using the GTX 1080 Ti.
`trainAll6_Titan/`	Output produced after training using the GTX Titan.
`trainAll7_2_1080_Ti/`	Output produced after training using two GTX 1080 Ti.