Home

Awesome

Splotch

Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) [1] data.

Features

The Splotch code in this repository supports single-, two-, and three-level experimental designs.

Installation

PyPI

$ pip install splotch-st

GitHub

$ pip install git+https://git@github.com/tare/Splotch.git

CUDA

To install JAX with NVIDIA support, please see this page for instructions.

Usage

The main steps of Splotch analysis are the following:

  1. Preparation of count files
  2. Annotation of ST spots
  3. Preparation of metadata table
  4. Splotch analysis

Preparation of count files

The count files have the following tab-separated values (TSV) file format

32.06_2.0431.16_2.0414.07_2.128.16_33.01
A130010J15Rik0000
A230046K03Rik0000
A230050P20Rik0000
A2m0100
Zzz30100

The rows and columns have gene identifiers and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively.

Annotation of ST spots

To get the most out of the statistical model of Splotch one has to annotate the ST spots based on their tissue context. These annotations will allow the model to share information across tissue sections, resulting in more robust conclusions.

To make the annotation step slightly less tedious, we have implemented a light-weight javascript tool called Span.

The annotation files have the following TSV file format

32.06_2.0431.16_2.0414.07_2.128.16_33.01
Vent_Med_White0000
Vent_Horn1100
Vent_Lat_White0000
Med_Grey0000
Dors_Horn0000
Dors_Edge0001
Med_Lat_White0000
Vent_Edge0010
Dors_Med_White0000
Cent_Can0000
Lat_Edge0000

The rows and columns correspond to the user-define anatomical annotation regions (AAR) and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively. For instance, the spot 32.06_2.04 has the Vent_Horn annotation (i.e. located in ventral horn). The annotation category of each ST spot is one-hot encoded and we do not currently support more than one annotation category per ST spot.

ST spots without annotation categories are discarded in the analysis. This behaviour can be useful when you want to discard some ST spots from the analysis based on the tissue histology.

Preparation of metadata table

The metadata table contains information about the samples (i.e. count files). Additionally, the metadata table is used for matching count and annotation files.

The metadata table has the following TSV file format

namelevel_1level_2level_3count_fileannotation_fileimage_file
L7CN36_C1G93A p120F1394count_tables/L7CN36_C1_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN36_C1.tsvimages/L7CN36_C1_HE.jpg
L7CN36_C2G93A p120F1394count_tables/L7CN36_C2_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN36_C2.tsvimages/L7CN36_C2_HE.jpg
L7CN30_C1WT p120M2967count_tables/L7CN30_C1_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN30_C1.tsvimages/L7CN30_C1_HE.jpg
L7CN30_C2WT p120M2967count_tables/L7CN30_C2_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN30_C2.tsvimages/L7CN30_C2_HE.jpg
L7CN69_D1WT p120M1310count_tables/L7CN69_D1_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN69_D1.tsvimages/L7CN69_D1_HE.jpg
L7CN69_D2WT p120M1310count_tables/L7CN69_D2_stdata_aligned_counts_IDs.txt.unified.tsvannotations/L7CN69_D2.tsvimages/L7CN69_D2_HE.jpg
CN96_E1WT p120F1040count_tables/CN96_E1_stdata_aligned_counts_IDs.txt.unified.tsvannotations/CN96_E1.tsvimages/CN96_E1_HE.jpg
CN96_E2WT p120F1040count_tables/CN96_E2_stdata_aligned_counts_IDs.txt.unified.tsvannotations/CN96_E2.tsvimages/CN96_E2_HE.jpg
CN93_E1G93A p120M975count_tables/CN93_E1_stdata_aligned_counts_IDs.txt.unified.tsvannotations/CN93_E1.tsvimages/CN93_E1_HE.jpg
CN93_E2G93A p120M975count_tables/CN93_E2_stdata_aligned_counts_IDs.txt.unified.tsvannotations/CN93_E2.tsvimages/CN93_E2_HE.jpg

Each sample (i.e. slide) has its own row in the metadata table. The columns level_1, level_2, and level_3 define how the samples are analyzed using the linear hierarchical AAR model. The columns level_1, count_file, and annotation_file are mandatory. The column level_2 is mandatory when using the two-level model. Similarly, the columns level_2 and level_3 are mandatory when using the three-level model. At the moment we only support categorical variables.

If a given slide contains tissue sections from multiple biological conditions in terms of the explanatory variables, then it is recommended to split the tissue sections into multiple count files so that the design matrix can be defined accordingly.

The user can include additional columns at their own discretion. For instance, we will use the column image_file in the tutorials.

Example data

In the tutorials directory, we have two example ST data sets

  1. ALS [3]
  2. Olfactory Bulb [1]

Splotch analysis

Please see the ALS and Olfactory Bulb tutorials.

In the simplest setting, the following lines would be enough to run Splotch on a single gene

# read input data
splotch_input_data = get_input_data("metadata.tsv")

# run Splotch on the Gfap gene
key = random.PRNGKey(0)
key, key_ = random.split(key)
splotch_result_nuts = run_nuts(key_, ["Gfap"], splotch_input_data)

References

[1] Ståhl, Patrik L., et al. "Visualization and analysis of gene expression in tissue sections by spatial transcriptomics." Science 353.6294 (2016): 78-82.

[2] Phan, Du, et al. "Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro." arXiv preprint 1912.11554 (2019).

[3] Maniatis, Silas, et al. "Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis." Science 364.6435 (2019): 89-93.