Home

Awesome

scGTM Logo

License: MIT Code style: black

scGTM: Single-cell generalized trend model

scGTM (orignally named as scKGAM) is the abbreviation for Single-cell Gene Expression Generalized Trend Model. This is a Python package for modeling the statistical relationship between pseudotime and gene expression data. The paper is published in Bioinformatics and is also available at bioRXiv.

It is intended for bioinformatic scientists, applied statisticians, and students who prefer using Metaheuristic algorithms in solving their own bioinformatic optimization problems. scGKM is able to provide various marginal gene distributions with interpretable regression functions. Check out more features!

Installation

To install the bleeding-edge version of scGTM, clone this repo:

$ git clone -b git@github.com:ElvisCuiHan/scGTM.git

and then run

$ cd scGTM
$ python run_scGTM.py --model.iter 100 --model.marginal 'ZIP' --model.save_dir "your/path/to/save" --data.dir "your/path/file.csv" --gene.start 3 --gene.end 4

Usage

scGTM provides a high-level implementation of various marginal distributions including Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) and zero-inflatd negative binomial (ZINB). Further, it utilizes particle swarm optimization algorithm in the package pyswarms to optimize the objective function. Thus, it aims to be user-friendly and customizable.

The data should be a cell-by-gene matrix where the first column corresponding to the pseudotime:

IndexPseudotimeGene1Gene2...
1.t1y11y12...
2.t2y21y22...
3.t3y31y32...
4.t4y41y42...

A typical data structure will be of the following form:

<img src="https://github.com/ElvisCuiHan/scKGAM/blob/main/Figures/data.png" width="700" />

All-in-one function

Suppose we want to regress Gene 1 on pseudotime using the scGTM, simply we run the run_scGTM file in shell:

python run_scGTM.py --model.iter {# of iterations} --model.marginal 'ZIP' --model.save_dir "your/path/to/save" --data.dir "your/path/file.csv" --gene.start {START INDEX} --gene.end {END INDEX} 

and we can replace run_scGTM.py with either run_scGTM_Hill_Only.py or run_scGTM_Valley_Only.py if we are only interested in one of the two trends.

Using the data in our demo folder, the command is:

python run_scGTM_Valley_Only.py --model.iter 120 --model.marginal 'ZIP' --model.save_dir "Demo/Results/" --data.dir "Demo/simu_nb_scGTM_input.csv" --gene.start 1 --gene.end 60

In the scGTM.py file (and the other two), we can modify the arguments to let the model outputs user-defined colors.

plot_args={
             'color': ['red', 'tomato', 'orange', 'violet'],
             'cmap': 'Blues',
         }

If one wants to estimate many genes with different marginals, we can first change the data directory in the function parallel and then use the command in terminal:

python run_scGTM_Hill_Only.py  --gene.start {START INDEX} --gene.end {END INDEX} --model.marginal "NB" --model.save_dir "YourTargetPath" --model.iter 150

Note the data should be in .csv format. The main function will return a .json file and .png figure.

Example

The following figure has shown a typical output by the main function in scGTM.py.

<img src="https://github.com/ElvisCuiHan/scKGAM/blob/main/Figures/100ZIP.png" width="700" />

The confidence intervals of {t0, k1, k2, mu} are saved in a .json file in the same directory.