Home

Awesome

SCOUP

SCOUP : a probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation.

Reference

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1109-3

Requirements

The following two libraries are necessary for pseudo-time estimation based on the shortest path on the PCA space. ** This pseudo-time is only used for initialing SCOUP, and hence, pseudo-time estimates from other methods or experimental time can be substituted for initialization.**

How to build

git clone https://github.com/hmatsu1226/SCOUP
cd SCOUP
make

Or download from "Download ZIP" button and unzip it.

Running SP

Estimate pseudo-time based on shortest path on the PCA space.

Usage
./sp <Input_file1> <Input_file2> <Output_file1> <Output_file2> <G> <C> <D>
Format of Input_file1

The Input_file1 is the G x C matrix of expression data (separated with 'TAB'). Each row corresponds to each gene, and each column corresponds to each cell.

Example of Input_file1
0.33	-4.95	-1.37	-4.07	...
5.01	4.45	3.82	3.02	...
.
.
.
Format of Input_file2

The Input_file2 contains the mean and variance of the initial normal distribution.

Example of Input_file2
0	0.0	1.7
1	1.0	2.3
2	-2.0	5.9
Format of Output_file1

The Output_file1 contains the pseudo-time estimates.

Example of Output_file1
0	0.826988
1	0.102140
2	0.758120
Format of Output_file2

The Output_file2 contains the coordinates of PCA.

This file contain (C+1) lines and the last line corresponds to the root cell defined by the mean of the initial distribution.

Example of Output_file2
0	3.04	0.42	
1	-21.21	-1.52	
2	5.76	0.48

Running SCOUP

Estimate the parameters of Mixute Ornstein-Uhlenbeck process.

Usage
./scoup <Options> <Input_file1> <Input_file2> <Input_file3> <Output_file1> <Output_file2> <Output_file3> <G> <C>
Options
Example of running SCOUP
./scoup -k 2 data/data.txt data/init.txt out/time_sp.txt out/gpara.txt out/cpara.txt out/ll.txt 500 100
Format of Input_file1

This is the expression data matrix data and is the same data as the Input_file1 of SP.

Format of Input_file2

This is initial distribution and is the same data as the Input_file2 of SP.

Format of Input_file3

This is the pseudo-time for initialization and is the same as the Output_file1 of SP.

Format of Output_file1

The Output_file1 contains the optimized parameters related to genes and lineages.

Example of Output_file1
 	 	0.509804 	0.490196
0.501610	2.528400	-6.338714 	-2.273163
0.309094	13.046904	3.545862 	0.337260
0.223226	4.212808	-4.443503 	9.629989
2.707472	14.221109	3.959898 	-2.353994
4.361342	34.646044	1.392565 	0.789397
Format of Output_file2
Example of Output_file2
0.941979	0.990196	0.009804	
2.000000	0.990196	0.009804	
2.000000	0.990196	0.009804	
1.102146	0.990196	0.009804	
0.839387	0.990196	0.009804
Format of Output_file3

The log-likelihood

Exapmle of Output_file3

Running SCOUP from the middle of the activity

Re-estimate parameters from the middle of the activity.

Usage
./scoup_resume <Options> <Input_file1> <Input_file2> <Input_file3> <Input_file4> <Output_file1> <Output_file2> <Output_file3> <G> <C>
Options

It is the same as the Options of "scoup".

Example of running SCOUP
./scoup_resume -k 2 -e 0.0001 data/data.txt data/init.txt out/gpara.txt out/cpara.txt out/gpara_2.txt out/cpara_2.txt out/ll_2.txt 500 100
Format of Input_file1

This is the same as the Input_file1 of "scoup".

Format of Input_file2

This is the same as the Input_file2 of "scoup".

Format of Input_file3

This is the parameters related to genes and lineages and is the same as the Output_file1 of SCOUP.

Format of Input_file4

This is the parameters related to cells and is the same as the Output_file2 of "scoup".

Format of Output_file1, 2, 3

These file are the same as the output files of SCOUP.

Running Correlation analysis

Calculate the correlation between genes after standardization.

Usage
./cor <Options> <Input_file1> <Input_file2> <Input_file3> <Input_file4> <Output_file1> <Output_file2> <G> <C>
Options
Example of running Correlation analysis
./cor data/data.txt data/init.txt out/gpara.txt out/cpara.txt out/nexp.txt out/cor.txt 500 100
Format of Output_file1

The Output_file1 contains the standardized expression data.

Format of Output_file2

The Output_file2 contains the correlation for the standardized expression data.

License

Copyright (c) 2015 Hirotaka Matsumoto Released under the MIT license