Awesome
DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Methodology of DeepMod
DeepMod is a computational tool which takes long-read signals as input and outputs modification summary for each genomic position in a reference genome together with modification prediction for each base in a long read. The modification prediction model in DeepMod is a well-trained bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) units. LSTM RNN is a class of artificial neural network for modeling sequential behaviors with LSTM to preclude vanishing gradient problem. To detect DNA modifications, normalized signals of events in a long read were rescaled from -5 and 5, and signal mean, standard deviation and the number of signals together with base information (denoted by 7-feature description) were obtained for each event as input of a LSTM unit with 100 hidden nodes. In DeepMod with 3 hidden layers in RNN. Predicted modification summary for each position would be generated in a BED format, suggesting how many reads cover genomic positions, how many mapped bases in long reads were predicted to be modified and the coverage percentage of prediction modifications. This modification prediction by DeepMod is strand-sensitive and single-nucleotide based.
Inputs of DeepMod
The input of DeepMod is Nanopore long read data together a refrence genome.
Please note that the default model is trained on Metrichore basecalled data. While it has reasoanble performance on Albacore v1 basecalled data, it should not be used in Albacore v2 (they require different sets of models) or any Guppy basecalled data, due to the differences in basecalling approaches. We tested the newly trained model on move table basecalled data with the guppy branch of this repository and have good performance. Therefore, in 2022, we released DeepMod2 which supports two modes: a Guppy basecalling mode, and a Tombo-resquiggled mode. Please note that models will be flowcell specific; right now our models are trained in R9 but if you use R10 flowcell then you need to wait until we release the R10 model. (UPDATE: in October 2022 we relased R10 models, so DeepMod2 is now the only open source softare that handles R10 flowcells.) If you use DeepMod in your research, please be mindful that different basecallers/flowcells can generate very different signal properties so the correct model (rather than default model) in DeepMod/DeepMod2 needs to be used for your specific data set.
System Requirements
Hardware requirements
DeepMod is based on deep learning framework, and needs to access raw data of Nanopore sequencing. Thus, it needs enough RAM to support deep learning framework and enough hard drive for raw data of Nanopore sequencing. GPU can substantially speedup the detection process. For optimal performance, we recommend a computer with:
- RAM: 20+ GB per thread
- GPU or CPU with 8+ cores
- HDD or better with SSD. Dependent on how large raw data is (for 30X E coli data, it might need 10+GB, while for 30X human data, it might need 10+TB)
Software requirements
The developmental version of DeepMod has been tested on Linux operating system: CentOS 7.0 with both CPU and GPU machines.
Future improvement
Now, DeepMod supports basecalled data with either event tables or move tables (Due to confusion to users who do not know the difference between main branch and guppy branch, in 2022 we made the decision to just create a DeepMod2 repository solely for move tables generated by Guppy basecaller; therefore, please do NOT use DeepMod for Guppy-basecalled data!). But it does not support multi-fast5. For multi-fast5 issue, one can use API at https://github.com/nanoporetech/ont_fast5_api to convert multi-fast5 to single fast5 file, and then re-basecall to get event information as input of DeepMod. We have been working on improvement of DeepMod to support multi-fast5.
Installation
Please refer to Installation for how to install DeepMod.
Usage
Please refer to Usage for how to use DeepMod.
Examples and Reproducibility of our analysis.
Please refer to Examples and Reproducibility for examples of how to run DeepMod.
Revision History
For release history, please visit here. For details, please go here.
Contact
If you have any questions/issues/bugs, please post them on GitHub. They would also be helpful to other users.
Reference
Please cite the publication below if you use our tool:
Q. Liu, L. Fang, G. Yu, D. Wang, C. Xiao, K. Wang. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun 10, 2019. Online at https://www.nature.com/articles/s41467-019-10168-2.