Awesome
SCODE
SCODE : an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation.
Reference
https://academic.oup.com/bioinformatics/article/33/15/2314/3100331
Requirements
SCODE is written with R, and use MASS library to calculate pseudo inverse matrix.
Download
git clone https://github.com/hmatsu1226/SCODE
cd SCODE
Or download from "Download ZIP" button and unzip it.
Running SCODE
Optimize linear ODE and infer regulatory network from time course data.
Usage
Rscript SCODE.R <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I>
- Input_file1 : G x C matrix of expression data
- Input_file2 : Time point data (e.g. pseudo-time data)
- Output_dir : Result files are outputted in this directory
- G : The number of transcription factors
- D : The number of z
- C : The number of cells
- I : The number of iterations of optimization
Example of running SCODE
Rscript SCODE.R data/exp_train.txt data/time_train.txt out 100 4 356 100
Format of Input_file1
The Input_file1 is the G x C matrix of expression data (separated with 'TAB'). Each row corresponds to each gene, and each column corresponds to each cell.
Example of Input_file1
1.24 1.21 1.28 ...
0.0 0.19 0.0 ...
.
.
.
Format of Input_file2
The Input_file2 contains the time point data (pseudo-time) of each cell.
- Col1 : Information of a cell (e.g. index of a cell, experimental time point)
- Col2 : Time parameter (e.g. pseudo-time) (normalized from 0.0 to 1.0)
Example of Input_file2
0 0.065
0 0.037
0 0.007
.
.
.
72 0.873
72 0.964
Output files of SCODE
SCODE outputs some files as below, and the files are named to correspond with the names of the variables in the paper.
A.txt
G x G matrix, which corresponds to infered regulatory network. Aij represents the regulatory relationship from TF j to TF i.
B.txt
D x D diagonal matrix, which corresponds to the optimized parameters of ODE of z.
W.txt
G x D matrix, which corresponds to W of linear regression.
RSS.txt
The residual sum of squares of linear regression.
Running SCODE several times
We recommend runnning SCODE several times and averaging the result (A) to obtain reliable relationships.
ruby run_R.rb <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I> <R>
- R : The number of traials
- Output_dir : Result files of each trial is outputted in the directory
The averaged A (meanA.txt) is outputted in the Output_dir.
<br> <br>SCODE implemented by Julia
Requirements
SCODE.jl is written with Julia(Version 0.5.0), and use DataFrames package. The runtimes of SCODE.jl is smaller than that of SCODE.R
Running SCODE.jl
julia SCODE.jl <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I>
Running SCODE.jl several times
ruby run_julia.rb <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I> <R>
<br>
<br>
# Downstream analysis
Calculation of RSS (RSS.R)
To choose appropriate size of z, we recommend to calculate RSS of independent test data.
Usage
Rscript RSS.R <Input_file1> <Input_file2> <Input_dir> <Output_file> <G> <D> <C>
- Input_file1 : G x C matrix of expression data
- Input_file2 : Time point data (e.g. pseudo-time data)
- Input_dir : The directory that W.txt and B.txt are saved (Output_dir of SCODE)
- Output_file : RSS for this data
- G : The number of transcription factors
- D : The number of z
- C : The number of cells
Example of running RSS.R
Rscript RSS.R data/exp_test.txt data/time_test.txt out out/RSS_test.txt 100 4 100
Reconstruction of expression dynamics (Reconstruct_dynamics.R)
Calculate the dynamics from optimized linear ODE.
Usage
Rscript Reconstruct_dynamics.R <Input_file1> <Input_file2> <Output_file> <G>
- Input_file1 : Initial value of x
- Input_file2 : A.txt
- Output_file : (G+1) x 101 matrix of reconstructed expression data
- G : The number of transcription factors
Example of running Reconstruct_dynamics.R
Rscript Reconstruct_dynamics.R data/init.txt out/A.txt out/dynamics.txt 100
Format of Input_file1
The Input_file1 is the initial values of x (separated with 'TAB'). Each row corresponds to each gene.
- Col1 : Index of a gene
- Col2 : Initial value
Example of Input_file1
0 1.253
1 1.266
2 1.548
.
.
.
Format of Output_file
The Output_file is the (G+1) x 101 matrix of reconstructed expression dynamics (separated with 'TAB'). The first column corresponds to time parameter (from 0.0 to 1.0 with 0.01 interval). Each row corresponds to each gene, and each column corresponds to each time point.
Example of Output_file
0 0.01 0.02 ...
1.253 1.241 1.233 ...
1.266 1.053 0.937 ...
.
.
Dataset
We validated SCODE with three time couse scRNA-Seq data. We extracted top 100 variable TFs.
data
scRNA-Seq data derived from PrE cells differentiated from mES cells (in preparation).
data2
scRNA-Seq data obtained to examine direct reprogramming from MEF cells to myocytes. Treutlein, Barbara, et al. "Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq." Nature (2016).
data3
scRNA-Seq data derived from DE cells differentiated from hES cells. Chu, Li-Fang, et al. "Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm." Genome biology 17.1 (2016): 173.
Rererence TF-TF network
Reference TF-TF networks are extracted from http://www.regulatorynetworks.org . The first column corresponds to target TF. The second column corresponds to regulator TF.
License
Copyright (c) 2016 Hirotaka Matsumoto Released under the MIT license