Awesome

A scalable SCENIC workflow for single-cell gene regulatory network analysis

This repository describes how to run a pySCENIC gene regulatory network inference analysis alongside a basic "best practices" expression analysis for single-cell data. This includes:

Standalone Jupyter notebooks for an interactive analysis
A Nextflow DSL1 workflow, which provides a semi-automated and streamlined method for running these steps
Details on pySCENIC installation, usage, and downstream analysis

See also the associated publication in Nature Protocols: https://doi.org/10.1038/s41596-020-0336-2.

For an advanced implementation of the steps in this protocol, see VSN Pipelines, a Nextflow DSL2 implementation of pySCENIC with comprehensive and customizable pipelines for expression analysis. This includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).

Overview

Quick start
Requirements
Installation
Case studies
- PBMC 10k dataset (10x Genomics)
  - Full SCENIC analysis, plus filtering, clustering, visualization, and SCope-ready loom file creation:
    - Jupyter notebook | HTML render
  - Extended analysis post-SCENIC:
    - Jupyter notebook | HTML render
  - To run the same dataset through the VSN Pipelines DSL2 workflow, see this tutorial.
- Cancer data sets
  - Jupyter notebook | HTML render
- Mouse brain data set
  - Jupyter notebook | HTML render
References and more information

Quick start

Running the pySCENIC pipeline in a Jupyter notebook

We recommend using this notebook as a template for running an interactive analysis in Jupyter. See the installation instructions for information on setting up a kernel with pySCENIC and other required packages.

Running the Nextflow pipeline on the example dataset

Requirements (Nextflow/containers)

The following tools are required to run the steps in this Nextflow pipeline:

Nextflow
A container system, either of:
- Docker
- Singularity

The following container images will be pulled by nextflow as needed:

Using the test profile

A quick test can be accomplished using the test profile, which automatically pulls the testing dataset (described in full below):

nextflow run aertslab/SCENICprotocol \
    -profile docker,test

This small test dataset takes approximately 70s to run using 6 threads on a standard desktop computer.

Download testing dataset

Alternately, the same data can be run with a more verbose approach (this is more illustrative for how to substitute other data into the pipeline). Download a minimum set of SCENIC database files for a human dataset (approximately 78 MB).

mkdir example && cd example/
# Transcription factors:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/test_TFs_tiny.txt
# Motif to TF annotation database:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/motifs.tbl
# Ranking databases:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/genome-ranking.feather
# Finally, get a tiny sample expression matrix (loom format):
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/expr_mat_tiny.loom

Running the example pipeline

Either Docker or Singularity images can be used by specifying the appropriate profile (-profile docker or -profile singularity). Please note that for the tiny test dataset to run successfully, the default thresholds need to be lowered.

Using loom input

nextflow run aertslab/SCENICprotocol \
    -profile docker \
    --loom_input expr_mat_tiny.loom \
    --loom_output pyscenic_integrated-output.loom \
    --TFs test_TFs_tiny.txt \
    --motifs motifs.tbl \
    --db *feather \
    --thr_min_genes 1

By default, this pipeline uses the container specified by the --pyscenic_container parameter. This is currently set to aertslab/pyscenic:0.9.19, which uses a container with both pySCENIC and Scanpy 1.4.4.post1 installed. A custom container can be used (e.g. one built on a local machine) by passing the name of this container to the --pyscenic_container parameter.

Expected output

The output of this pipeline is a loom-formatted file (by default: output/pyscenic_integrated-output.loom) containing:

The original expression matrix
The pySCENIC-specific results:
- Regulons (TFs and their target genes)
- AUCell matrix (cell enrichment scores for each regulon)
- Dimensionality reduction embeddings based on the AUCell matrix (t-SNE, UMAP)
Results from the parallel best-practices analysis using highly variable genes:
- Dimensionality reduction embeddings (t-SNE, UMAP)
- Louvain clustering annotations

General requirements for this workflow

Python version 3.6 or greater
Tested on various Unix/Linux distributions (Ubuntu 18.04, CentOS 7.6.1810, MacOS 10.14.5)

References and more information

SCENIC

SCENIC (R) on GitHub
SCENIC website
SCENIC publication
pySCENIC on GitHub
pySCENIC documentation
VSN Pipelines, a repository of pipelines for single-cell data in Nextflow DSL2, including an implementation of pySCENIC.