Awesome

HarmonizationSCANVI

Reproducing results in the "Harmonization and Annotation of Single-cell Transcriptomics data with Deep Generative Models" paper
Demonstration of how to use scVI and scANVI for the harmonization and annotation problem

Contact

chenlingantelope [at] berkeley [dot] edu

Datasets

Analysis	Associated Script	Datasets	Technology	Number of Cells	Ref.
Figure 2: Benchmark	PBMC8KCITE.py	PBMC-8K; PBMC-CITE	10x	8,381; 7,667	10x Datasets Stoeckius, Marlon, et al. 2017
Supplementary Figure 2: UMAP Visualization	PBMC8KCITE.py	PBMC-8K; PBMC-CITE	10x	8,381; 7,667	10x Datasets; Stoeckius, Marlon, et al. 2017
Figure 2: Benchmark	MarrowTM.py Tech1.pretty.ipynb	MarrowTM-10x; MarrowTM-ss2	10x; SmartSeq2	4,112;5,351	Quake, Stephen R., et al. 2018
Supplementary Figure 1: Robustness Analysis for Hyperparameter Choice	Robustness_study.ipynb	MarrowTM-10x; MarrowTM-ss2	10x; SmartSeq2	4,112;5,351	Quake, Stephen R., et al. 2018
Supplementary Figure 3: UMAP Visualization	MarrowTM.py	MarrowTM-10x; MarrowTM-ss2	10x; SmartSeq2	4,112;5,351
Figure 2: Benchmark	Pancreas.py	Pancreas-InDrop; Pancreas-CEL-Seq2	inDrop; CEL-Seq2	8,569; 2,449	Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016
Supplementary Figure 4: UMAP Visualization	Pancreas.py	Pancreas-InDrop; Pancreas-CEL-Seq2	inDrop; CEL-Seq2	8,569; 2,449	Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016
Figure 2: Benchmark	DentateGyrus.py	DentateGyrus-10x; DentateGyrus-C1	10x; Fluidigm C1	5,454; 2,303	Hochgerner, Hannah, et al. 2018
Supplementary Figure 5: UMAP Visualization	DentateGyrus.py	DentateGyrus-10x; DentateGyrus-C1	10x; Fluidigm C1	5,454; 2,303	Hochgerner, Hannah, et al. 2018
Figure 3: Robustness Analysis by subsampling cells Supplementary Figure 10	NoOverlapSCANVI.py PopRemoveSCANVI.py SCANVI_posterior-NoOverlap.ipynb SCANVI_posterior_poprm.ipynb	PBMC-8K; PBMC-CITE	10x	8,381; 7,667	10x Datasets; Stoeckius, Marlon, et al. 2017
Figure 4: Continuous Trajectory Supplementary Supplementary Figure 6: UMAP	continuous.ipynb	HEMATO-Tusi; HEMATO-Paul	inDrop; MARS-seq	4,016 ; 2,730	Tusi, Betsabeh Khoramian, et al. 2018; Paul, Franziska, et al. 2015
Figure 5: External Validation by Experimentally Derived Labels, Supplementary Figure 11	harmonization-CitePure-SCANVI.ipynb	PBMC-68K; PBMC-Sorted; PBMC-CITE	10x	68,579; 94,655; 7,667	Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017
Figure 6: Semi-Supervised Annotation of T Cell Subtypes, Supplementary Figure 12	SCANVI-mild-annot-Clustering.ipynb	PBMC-Sorted T cell Subtypes	10x	42919	Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017
Hierarchical Semi-Supervised Annotation	Hierarchical.ipynb	CORTEX	10x	160,796	Zeisel, Amit, et al. "Molecular architecture of the mouse nervous system." bioRxiv (2018): 294918.
Supplementary Figure 7: Scalability Analysis	scanorama.ipynb	SCANORAMA	Mixed	105,476	Hie, Brian L., Bryan Bryson, and Bonnie Berger. "Panoramic stitching of heterogeneous single-cell transcriptomic data." bioRxiv (2018): 371179.
Supplementary Figure 13: Differential Expression	DE-final.ipynb	PBMC-8K; PBMC-68K	10x	8,381; 68,579	10x Datasets; Zheng, Grace XY, et al. 2017

Supplemtary Figure 2,3,4,5,8,9 are generated using scripts in Additional_Scripts/ using output from the analysis python scripts including scanvi_acc.R, KNNcurves.py and BE_curves.py.
Boxplots for Figure 3 are generated using poprm_boxplot.R in Additional_Scripts/
The Additional_Scripts also contains code for running Seurat directly from commandline runSeurat.R and SeuratPCA.R.
All .gmt files in Additional_Scripts/ are gene signatures.

Installation

Clone the github repository, install the dependencies and call functions from the modules scVI
Install time (< 10 min)

Requirements

Pytorch V0.4.1
Python 3
scikit-learn V0.19.1

Instructions

To reproduce results from the paper, look up the relevant datasets, python notebooks (located in notebooks/), or python scripts (located in the root directory).
Download the relevant datasets except for the ones already wrapped for the scVI package (PBMC-8K, PBMC-CITE, PBMC-68K, PBMC-Sorted, MarrowTM-10x, MarrowTM-ss2 can be loaded directly with the dataloader functions)
Annotation files generated by us when the original study did not provide annotation (cite.seurat.labels) can be found in the scvi-data repository
Run the analysis and results should match those of the paper.
This repository contains functions written uniquely to produce some of the analysis in this paper. For more up-to-date package refer to main scVI repository