Awesome

Splotch

Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) [1] data.

Features

Supports complex hierarchical experimental designs and model-based analysis of replicates
Full Bayesian inference with Hamiltonian Monte Carlo (HMC) using the adaptive HMC sampler as implemented in NumPyro [2]
- CPU, GPU, and TPU support
Analysis of expression differences between anatomical regions and conditions using posterior samples
Different anatomical annotated regions (AARs) are modeled using a linear model
Zero-inflated Poisson or Poisson likelihood
Gaussian Process prior for spatial random effect

The Splotch code in this repository supports single-, two-, and three-level experimental designs.

Installation

PyPI

$ pip install splotch-st

GitHub

$ pip install git+https://git@github.com/tare/Splotch.git

CUDA

To install JAX with NVIDIA support, please see this page for instructions.

Usage

The main steps of Splotch analysis are the following:

Preparation of count files
Annotation of ST spots
Preparation of metadata table
Splotch analysis

Preparation of count files

The count files have the following tab-separated values (TSV) file format

	32.06_2.04	31.16_2.04	14.07_2.1	…	28.16_33.01
A130010J15Rik	0	0	0	…	0
A230046K03Rik	0	0	0	…	0
A230050P20Rik	0	0	0	…	0
A2m	0	1	0	…	0
⋮	⋮	⋮	⋮	⋱	⋮
Zzz3	0	1	0	…	0

The rows and columns have gene identifiers and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively.

Annotation of ST spots

To get the most out of the statistical model of Splotch one has to annotate the ST spots based on their tissue context. These annotations will allow the model to share information across tissue sections, resulting in more robust conclusions.

To make the annotation step slightly less tedious, we have implemented a light-weight javascript tool called Span.

The annotation files have the following TSV file format

	32.06_2.04	31.16_2.04	14.07_2.1	…	28.16_33.01
Vent_Med_White	0	0	0	…	0
Vent_Horn	1	1	0	…	0
Vent_Lat_White	0	0	0	…	0
Med_Grey	0	0	0	…	0
Dors_Horn	0	0	0	…	0
Dors_Edge	0	0	0	…	1
Med_Lat_White	0	0	0	…	0
Vent_Edge	0	0	1	…	0
Dors_Med_White	0	0	0	…	0
Cent_Can	0	0	0	…	0
Lat_Edge	0	0	0	…	0

The rows and columns correspond to the user-define anatomical annotation regions (AAR) and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively. For instance, the spot 32.06_2.04 has the Vent_Horn annotation (i.e. located in ventral horn). The annotation category of each ST spot is one-hot encoded and we do not currently support more than one annotation category per ST spot.

ST spots without annotation categories are discarded in the analysis. This behaviour can be useful when you want to discard some ST spots from the analysis based on the tissue histology.

Preparation of metadata table

The metadata table contains information about the samples (i.e. count files). Additionally, the metadata table is used for matching count and annotation files.

The metadata table has the following TSV file format

name	level_1	level_2	level_3	count_file	annotation_file	image_file
L7CN36_C1	G93A p120	F	1394	count_tables/L7CN36_C1_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN36_C1.tsv	images/L7CN36_C1_HE.jpg
L7CN36_C2	G93A p120	F	1394	count_tables/L7CN36_C2_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN36_C2.tsv	images/L7CN36_C2_HE.jpg
L7CN30_C1	WT p120	M	2967	count_tables/L7CN30_C1_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN30_C1.tsv	images/L7CN30_C1_HE.jpg
L7CN30_C2	WT p120	M	2967	count_tables/L7CN30_C2_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN30_C2.tsv	images/L7CN30_C2_HE.jpg
L7CN69_D1	WT p120	M	1310	count_tables/L7CN69_D1_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN69_D1.tsv	images/L7CN69_D1_HE.jpg
L7CN69_D2	WT p120	M	1310	count_tables/L7CN69_D2_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/L7CN69_D2.tsv	images/L7CN69_D2_HE.jpg
CN96_E1	WT p120	F	1040	count_tables/CN96_E1_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/CN96_E1.tsv	images/CN96_E1_HE.jpg
CN96_E2	WT p120	F	1040	count_tables/CN96_E2_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/CN96_E2.tsv	images/CN96_E2_HE.jpg
CN93_E1	G93A p120	M	975	count_tables/CN93_E1_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/CN93_E1.tsv	images/CN93_E1_HE.jpg
CN93_E2	G93A p120	M	975	count_tables/CN93_E2_stdata_aligned_counts_IDs.txt.unified.tsv	annotations/CN93_E2.tsv	images/CN93_E2_HE.jpg

Each sample (i.e. slide) has its own row in the metadata table. The columns level_1, level_2, and level_3 define how the samples are analyzed using the linear hierarchical AAR model. The columns level_1, count_file, and annotation_file are mandatory. The column level_2 is mandatory when using the two-level model. Similarly, the columns level_2 and level_3 are mandatory when using the three-level model. At the moment we only support categorical variables.

If a given slide contains tissue sections from multiple biological conditions in terms of the explanatory variables, then it is recommended to split the tissue sections into multiple count files so that the design matrix can be defined accordingly.

The user can include additional columns at their own discretion. For instance, we will use the column image_file in the tutorials.

Example data

In the tutorials directory, we have two example ST data sets

Splotch analysis

Please see the ALS and Olfactory Bulb tutorials.

In the simplest setting, the following lines would be enough to run Splotch on a single gene

# read input data
splotch_input_data = get_input_data("metadata.tsv")

# run Splotch on the Gfap gene
key = random.PRNGKey(0)
key, key_ = random.split(key)
splotch_result_nuts = run_nuts(key_, ["Gfap"], splotch_input_data)

References

[1] Ståhl, Patrik L., et al. "Visualization and analysis of gene expression in tissue sections by spatial transcriptomics." Science 353.6294 (2016): 78-82.

[2] Phan, Du, et al. "Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro." arXiv preprint 1912.11554 (2019).

[3] Maniatis, Silas, et al. "Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis." Science 364.6435 (2019): 89-93.