Awesome
This repository contains code to reproduce results from "Compression of quantification uncertainty for scRNA-seq counts." For simple sample code to calculate the uncertainty aware pvalues, see the file "UncertaintyAwarePvalueSampleCode.R." For other code to reproduce analyses from the paper, see the rest of the code and the instructions below.
Last updated July 2, 2020
Instructions to reproduce the main analyses from the paper are given below. Note that much of the code here utilizes functions defined in the "SingleCellProjectFunctions.R" file.
First, code to reproduce the main coverage and trajectory analysis results:
-
First, code in the file
QuantifyPBMC4K.sh
within the subdirectoryQuantifyPBMC4KData
will quantify the PBMC 4K data, andSavePBMC4KDataToRData.R
will save the results in an R friendly format. These results are used to assign realistic gene names to simulated data based on the rank of gene expression. -
Then, generate the simulation objects using the splatter package (for the two-group difference simulations) or the dyntoy package (for the trajectory simulations). The former is done in the file
GenerateSplatterObjectForSwish.R
within theSwishAnalyses
subdirectory and the latter is done in the fileGenerateDyntoyObjects.R
within theMainSimulationCode
subdirectory. -
Then, run minnow to simulate the reads corresponding to the simulated counts and alevin to quantify the counts and generate the bootstrap replicates and compressed uncertainty estimates. The file
RunMinnowAndAlevin.bash
runs minnow and alevin with 100 bootstrap replicates, and the fileRunAlevinWith20InfReps.bash
can be run after to repeat alevin with 20 bootstrap replicates. -
Then, the file
SaveDatasetsToRAndCalculateCoverages.R
will save the alevin quantification data to R, calculate all the coverage results, and generate simulated pseudo inferential replicates. -
Now, Run
SummarizeCoverageResults.R
to generate all coverage related plots. -
Next,
RunTradeSeq.R
will run the tradeSeq code. -
Then, the file
SummarizeTradeSeq.R
will import the full results from tradeSeq and compute the uncertainty aware pvalues. These are saved in a format to be used for iCOBRA plotting. -
The file
PlotTradeSeqPowerResults.R
will then generate the iCOBRA plots corresponding to the tradeSeq results.
Now, code for Mouse Embryo trajectory analysis:
-
Run
ImportMouseEmbryoResultsTo.R
to import the quantified data into R. -
Run the file
MouseEmbryoTrajectoryAnalysis.R
to run the trajectory analysis, both forcing the EM and NoEM results to have the same cell clusters and lineages/pseudotimes and not forcing them to. -
Run the file
InfRepTradeSeqMouseEmbryoData.R
to conduct the trajectory analysis for each pseudo-inferential replicate. -
Run
AnalyzeMouseEmbryoTradeSeqResults.R
to combine all trajectory results and calculate the uncertainty-aware pvalues. -
Run the file
PlotsForMouseEmbryoTrajectoryAnalysis.R
to generate all plots from the mouse embryo trajectory analyses.
Now, code corresponding to the swish and SplitSwish analyses:
-
For the swish results, first run the file
SaveFullDataForSwish.R
to save the full data (including bootstrap replicates) for the swish analysis. Then, run the fileRunSwish.R
to run the full swish results. -
For the splitSwish results, first run the file
SplitSwish.R
to save the data (without the full bootstrap replicates) and save the necessary Snakemake file to run the splitSwish method. The snakeMake file can then be run in command line shell. -
Lastly, the iCOBRA plot comparing performance of swish to SplitSwish can be plotted using the file
PlotSplitSwishResults.R
.