Home

Awesome

CIPR-Package

<br> <img align="right" src="https://github.com/atakanekiz/CIPR-Package/raw/master/doc/CIPR_hex_mid.png" width=300>

Cluster Identity Predictor

<br>

During the analysis of single cell RNA sequencing (scRNAseq) data, annotating the biological identity of cell clusters is an important step before downstream analyses and it remains technically challenging. The current solutions for annotating single cell clusters generally lack a graphical user interface, can be computationally intensive or have a limited scope. On the other hand, manually annotating single cell clusters by examining the expression of marker genes can be subjective and labor-intensive.

To improve the quality and efficiency of annotating cell clusters in scRNAseq data, we present a web-based R/Shiny app and R package, Cluster Identity PRedictor (CIPR), which provides a graphical user interface to quickly score gene expression profiles of unknown cell clusters against mouse or human references, or a custom dataset provided by the user. CIPR can be easily integrated into the current pipelines to facilitate scRNAseq data analysis.

CIPR performs analyses at individual cluster level and generates informative graphical outputs to help the users assess the quality of algorithmic predictions (see the example outputs below).

This repository contains the source code for the R package implementation of CIPR pipeline. For CIPR-Shiny, please check out CIPR-Shiny repository.


Installation and Usage


if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("atakanekiz/CIPR-Package", build_vignettes = TRUE)

# # For faster installation without vignette
# devtools::install_github("atakanekiz/CIPR-Package", build_vignettes = FALSE)

Example use case in conjunction with Seurat pipeline



library(Seurat)

allmarkers <- FindAllMarkers(seurat_object)
avgexp <- AverageExpression(seurat_object)


# Plot summarizing top scoring references per cluster (logFC comparison)
CIPR(input_dat = allmarkers,
     comp_method = "logfc_dot_product", 
     reference = "immgen", 
     plot_ind = F,
     plot_top = T)
     
# Plot summarizing top scoring references per cluster (all-genes correlation)
CIPR(input_dat = allmarkers,
     comp_method = "logfc_dot_product", 
     reference = "immgen", 
     plot_ind = F,
     plot_top = T)
     
     
# Plots for individual clusters
CIPR(input_dat = allmarkers,
     comp_method = "logfc_dot_product", 
     reference = "immgen", 
     plot_ind = T,
     plot_top = F)

# Limiting the analysis to certain reference subsets
CIPR(input_dat = allmarkers,
     comp_method = "logfc_dot_product", 
     reference = "immgen", 
     plot_ind = F,
     plot_top = T, 
     select_ref_subsets = c("T cell", "B cell", "NK cell"))




Reference datasets available in CIPR


Analytical approach

CIPR calculates pairwise identity scores between individual unknown clusters and the reference samples and generates a vector of identity scores per each cluster in the experiment. While doing this CIPR utilizes two main approaches:


Flexible options

To be adaptable to various experimental contexts, CIPR enables users to:


Sample outputs

Results per cluster

In the plot below x-axis signifies the individual samples within the reference data frame (ImmGen in this example). Reference cell types are marked by different colors. Each data point indicates the identity score calculated for Cluster 1 in the input data. Shaded regions demarcate 1 and 2 standard deviations around the average identity score across the reference dataset. In this analysis logFC dot product method was used.

<kbd> <img src=https://github.com/atakanekiz/CIPR-Package/raw/master/doc/sample_ind_output.png> </kbd>

Summary of top hits per cluster

It is often easier to examine the top predictions in one graph. This plot shows the top 5 scoring reference samples for each cluster (shown in different colors).

<kbd> <img src=https://github.com/atakanekiz/CIPR-Package/raw/master/doc/sample_top_output.png> </kbd>