Home

Awesome

Pseudocell Tracer: Inferring cellular trajectories from scRNA-seq

Pseudocell Tracer infers trajectories in pseudospace rather than in pseudotime.

Prerequisites

A Conda Python environment is provided in pseudocell_tracer.yml

conda env create -f pseudocell_tracer.yml

Usage

pseudocell_tracer.py [-h] DATA SIDE_DATA OUTPUT_DIR [--plot_style PLOT_STYLE] [--num_cells NUM_CELLS] 
                          [--num_steps NUM_STEPS] --start START [START ...] --end END [END ...] 
                          [--genes GENES [GENES ...]]
Perform Pseudocell Tracer Algorithm

positional arguments:
  DATA                        Tab delimited file representing matrix of samples by genes
  SIDE_DATA                   Tab delimited file for side information to be used
  OUTPUT_DIR                  Output directory

optional arguments:
  -h, --help                  show this help message and exit
  --plot_style PLOT_STYLE     Use UMAP or tSNE for plotting (Default: UMAP)
  --num_cells NUM_CELLS       Number of pseudocells to generate at each step (Default: 100)
  --num_steps NUM_STEPS       Number of pseudocell states (Default: 100)
  --start START [START ...]   List of starting pseudocell states
  --end END [END ...]         List of ending pseudocell states
  --genes GENES [GENES ...]   Genes to plot in pseudocell trajectory

Example for provided dataset

python pseudocell_tracer.py data/mnn.nocos.full.genes.tsv data/ighc.genes.relative.tsv output_dir --start Ighm --end Ighg1 Ighg2b Ighg3 --genes Aicda Bach2

The provided command will run Pseudocell Tracer on the provided scRNA-Seq data (provided in ZIP format) and store results in the directory output_dir. This run will infer three trajectories: IghM to IghG1, IghM to IghG2b, and IghM to IghG3. For each trajectory, 100 pseudocell states are generated for each step over 100 steps. The expression of the genes denoting starting and stopping states are plotted in addition to the additional genes specified: Aicda and Bach2. Neural network hyper-parameters can bet set in config.py.

The output directory will contain the following files:

FileDescription
run_network_config.txtCopy of config.py used
run_parameters.txtCopy of command line parameters
encoder.h5Encoder model
decoder.h5Decoder model
input_scatter.pngVisual representation of input (Observed)
latent_scatter.pngVisual representation of latent space (Observed)
reconstructed_scatter.pngVisual representation of reconstruction (Observed)
generated_latent_scatter.pngVisual representation of latent space (Generated)
generated_latent_reconstruction.pngVisual representation of reconstruction (Generated)

In addition there will be sub directories for inferred trajectories will contain the following:

FileDescription
generated_data.tsvTab separated file for generated gene expression data
generated_latent_data.tsvTab separated file for generated latent gene expression data
generated_side_data.tsvTab separated file for generated side data
genes.pngLine plot showing the trajectories of selected genes

Version

1.0.0 (2020/05/06)

Publication

TBA

Contact

License

Software provided to academic users under MIT License