Home

Awesome

gateR: Flow/Mass Cytometry Gating via Spatial Kernel Density Estimation <img src="man/figures/gateR.png" width="120" align="right" />

<!-- badges: start -->

R-CMD-check CRAN status CRAN version CRAN RStudio mirror downloads total CRAN RStudio mirror downloads monthly License GitHub last commit DOI

<!-- badges: end -->

Date repository last updated: January 23, 2024

<h2 id="overview">

Overview

</h2>

The gateR package is a suite of R functions to identify significant spatial clustering of flow and mass cytometry data used in immunological investigations. For a two-group comparison, we detect clusters using the kernel-based spatial relative risk function estimated using the sparr package. The tests are conducted in a two-dimensional space comprised of two fluorescent markers.

Examples of a single condition with two groups:

  1. Disease case vs. Healthy control
  2. Time 2 vs. Time 1 (baseline)

For a two-group comparison of two conditions, we estimate two relative risk surfaces for one condition and then a ratio of the relative risks. For example:

  1. Estimate a relative risk surface for:
    1. Condition 2B vs. Condition 2A
    2. Condition 1B vs. Condition 1A
  2. Estimate the relative risk surface for the ratio:

$$\frac{ \big(\frac{Condition2B}{Condition2A}\big)}{\big(\frac{Condition1B}{Condition1A}\big)}$$

Within areas where the relative risk exceeds an asymptotic normal assumption, the gateR package has the functionality to examine the features of these cells. Basic visualization is also supported.

<h2 id="install">

Installation

</h2>

To install the release version from CRAN:

install.packages("gateR")

To install the development version from GitHub:

devtools::install_github("lance-waller-lab/gateR")
<h2 id="available-functions">

Available functions

</h2> <table> <colgroup> <col width="30%"/> <col width="70%"/> </colgroup> <thead> <tr class="header"> <th>Function</th> <th>Description</th> </tr> </thead> <tbody> <td><code>gating</code></td> <td>Main function. Conduct a gating strategy for flow and mass cytometry data.</td> </tr> <td><code>rrs</code></td> <td>Called within <code>gating</code>, one condition comparison.</td> </tr> <td><code>lotrrs</code></td> <td>Called within <code>gating</code>, two condition comparison. </td> </tr> <td><code>pval_correct</code></td> <td>Called within <code>rrs</code> and <code>lotrrs</code>, calculates various multiple testing corrections for the alpha level. Five methods account for (spatially) dependent, multiple testing.</td> </tr> <td><code>lrr_plot</code></td> <td>Called within <code>rrs</code> and <code>lotrrs</code>, provides functionality for basic visualization of a log relative risk surface.</td> </tr> <td><code>pval_plot</code></td> <td>Called within <code>rrs</code> and <code>lotrrs</code>, provides functionality for basic visualization of a significant p-value surface.</td> </tr> </tbody> <table>

The repository also includes the code and resources to create the project hexagon sticker.

<h2 id="available-data">

Available sample data sets

</h2> <table> <colgroup> <col width="30%"/> <col width="70%"/> </colgroup> <thead> <tr class="header"> <th>Data</th> <th>Description</th> </tr> </thead> <tbody> <td><code>randCyto</code></td> <td>A sample dataset containing information about flow cytometry data with two binary conditions and four markers. The data are a random subset of the 'extdata' data in the <a href="https://bioconductor.org/packages/release/data/experiment/html/flowWorkspaceData.html">flowWorkspaceData</a> package found on <a href="https://bioconductor.org">Bioconductor</a> and formatted for `gateR` input.</td> </tr> </tbody> <table> <h2 id="authors">

Authors

</h2>

See also the list of contributors who participated in this project. Main contributors include:

Usage

set.seed(1234) # for reproducibility

# ------------------ #
# Necessary packages #
# ------------------ #

library(gateR)
library(dplyr)
library(flowWorkspaceData)
library(ncdfFlow)
library(stats)

# ---------------- #
# Data preparation #
# ---------------- #

# Use 'extdata' from the {flowWorkspaceData} package
flowDataPath <- system.file("extdata", package = "flowWorkspaceData")
fcsFiles <- list.files(pattern = "CytoTrol", flowDataPath, full = TRUE)
ncfs  <- ncdfFlow::read.ncdfFlowSet(fcsFiles)
fr1 <- ncfs[[1]]
fr2 <- ncfs[[2]]

## Comparison of two samples (single condition) "g1"
## Two gates (four markers) "CD4", "CD38", "CD8", and "CD3"
## Arcsinh Transformation for all markers
## Remove cells with NA and Inf values

# First sample
obs_dat1 <- data.frame("id" = seq(1, nrow(fr1@exprs), 1),
                       "g1" = rep(1, nrow(fr1@exprs)),
                       "arcsinh_CD4" = asinh(fr1@exprs[ , 5]),
                       "arcsinh_CD38" = asinh(fr1@exprs[ , 6]),
                       "arcsinh_CD8" = asinh(fr1@exprs[ , 7]),
                       "arcsinh_CD3" = asinh(fr1@exprs[ , 8]))
# Second sample
obs_dat2 <- data.frame("id" = seq(1, nrow(fr2@exprs), 1),
                       "g1" = rep(2, nrow(fr2@exprs)),
                       "arcsinh_CD4" = asinh(fr2@exprs[ , 5]),
                       "arcsinh_CD38" = asinh(fr2@exprs[ , 6]),
                       "arcsinh_CD8" = asinh(fr2@exprs[ , 7]),
                       "arcsinh_CD3" = asinh(fr2@exprs[ , 8]))
                       
# Full set
obs_dat <- rbind(obs_dat1, obs_dat2)
obs_dat <- obs_dat[complete.cases(obs_dat), ] # remove NAs
obs_dat <- obs_dat[is.finite(rowSums(obs_dat)), ] # remove Infs
obs_dat$g1 <- as.factor(obs_dat$g1) # set "g1" as binary factor

## Create a second condition (randomly split the data)
## In practice, use data with a measured second condition
g2 <- stats::rbinom(nrow(obs_dat), 1, 0.5)
obs_dat$g2 <- as.factor(g2)
obs_dat <- obs_dat[ , c(1:2,7,3:6)]

# Export 'randCyto' data for CRAN examples
randCyto <- dplyr::sample_frac(obs_dat, size = 0.1) # random subsample

# ---------------------------- #
# Run gateR with one condition #
# ---------------------------- #

# Single condition
## A p-value uncorrected for multiple testing
test_gating <- gateR::gating(dat = obs_dat,
                             vars = c("arcsinh_CD4", "arcsinh_CD38",
                                      "arcsinh_CD8", "arcsinh_CD3"),
                             n_condition = 1,
                             plot_gate = TRUE,
                             upper_lrr = 1,
                             lower_lrr = -1)

# -------------------- #
# Post-gate assessment #
# -------------------- #

# Density of arcsinh-transformed CD4 post-gating
graphics::plot(stats::density(test_gating$obs[test_gating$obs$g1 == 1, 4]),
               main = "arcsinh CD4",
               lty = 2)
graphics::lines(stats::density(test_gating$obs[test_gating$obs$g1 == 2, 4]),
                lty = 3)
graphics::legend("topright",
                 legend = c("Sample 1", "Sample 2"),
                 lty = c(2, 3),
                 bty = "n")

# ----------------------------- #
# Run gateR with two conditions #
# ----------------------------- #

## A p-value uncorrected for multiple testing
test_gating2 <- gateR::gating(dat = obs_dat,
                              vars = c("arcsinh_CD4", "arcsinh_CD38",
                                       "arcsinh_CD8", "arcsinh_CD3"),
                              n_condition = 2)

# --------------------------------------------- #
# Perform a single gate without data extraction #
# --------------------------------------------- #

# Single condition
## A p-value uncorrected for multiple testing
## For "arcsinh_CD4" and "arcsinh_CD38"
test_rrs <- gateR::rrs(dat = obs_dat[ , -7:-6])

# Two conditions
## A p-value uncorrected for multiple testing
## For "arcsinh_CD8" and "arcsinh_CD3"
test_lotrrs <- gateR::lotrrs(dat = obs_dat[ , -5:-4])

# ------------------------------------------ #
# Run gateR with multiple testing correction #
# ------------------------------------------ #

## False Discovery Rate
test_gating_fdr <- gateR::gating(dat = obs_dat,
                              vars = c("arcsinh_CD4", "arcsinh_CD38",
                                       "arcsinh_CD8", "arcsinh_CD3"),
                              n_condition = 1,
                              p_correct = "FDR")

Funding

This package was developed while the author was originally a doctoral student at in the Environmental Health Sciences doctoral program at Emory University and later as a postdoctoral fellow supported by the Cancer Prevention Fellowship Program at the National Cancer Institute. Any modifications since December 05, 2022 were made while the author was an employee of Social & Scientific Systems, Inc., a division of DLH Corporation.

Acknowledgments

When citing this package for publication, please follow:

citation("gateR")

Questions? Feedback?

For questions about the package, please contact the maintainer Dr. Ian D. Buller or submit a new issue.