Awesome

scRNA-seq-pipelines

This repository is a compendium to “A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines” by Vieth et al. 2019, published in Nature Communications. There is also a version on bioRxiv.

All code in the repository is distributed under the GPL-3 license.

For examples on how to use the power analyis tool powsimR, please see powsimR.

For any questions or issues with the code in this repository, please use the “Issues” tab.

Getting started

Below you will find a brief outline of the analysis. Please refer to the corresponding folders for detailed information.

Dependencies

You will need the following software:

samtools version 1.9
BWA version 0.7.12
kallisto 0.43.1
zUMIs 2.4.5
powsimR (see prepackaged version in simulation folder)

Data aquisition

scRNA-seq data sets

The main input for the simulations were the scRNA-seq experiments from Ziegenhain et al., 2017. The fastq files can be downloaded from GSE75790. The cDNA reads were cut to 45 bp length and reads were randomly downsampled to 1 million reads per cell to be comparable. In addition, the fastq files of 1k 1:1 mixture of HEK293T and NIH3T cells from 10X Genomics support were downloaded and reads from the QC-passed NIH3T cells were extracted. For the comparison of pipelines, we processed the expression profiles of ~ 1000 human PBMCs from 10X Genomics.

Annotation

The annotation folder contains detailed information on how to get the necessary files from Gencode, Vega and RefSeq and processing them in order to use them for alignments. Additionally, we provide ERCC spike-in sequences.

Alignment

The alignment folder contains the commands per aligner for indexing, mapping, counting expression per gene and subsequently summarising the expression into one gene count matrix per protocol, aligner and annotation combination. In particular, it contains R scripts for filtering out multi-mapped reads for bwa and kallisto alignments. Additionally, we provide an RNA-seq experiment output of one cell (fastq file) as exemplary input for bwa and kallisto.

Simulations

We use powsimR for our simulations. The folder simulation contains a walkthrough R script to estimate the distributional parameters necessary to simulate gene expression, to set up differential expression and to finally run the simulations. Additionally, we provide exemplary input of SCRB-seq (UMI) and Smart-seq2 data to run simulations.