Home

Awesome

RNAseq_analysis_scripts

Scripts for RNA seq analysis. They can be used directly using the outputs from [IARCbioinfo's RNA seq workflow].(https://github.com/IARCbioinfo/RNAseq-nf)

Unsupervised analysis: RNAseq_unsupervised.R

This script performs unsupervised analyses (Principal Component Analysis and clustering) from htseq-count outputs.

Prerequisites

This R script requires the following packages:

Usage

Rscript RNAseq_unsupervised.R [options]
PARAMETERDEFAULTDESCRIPTION
-f.folder with count files
-ooutoutput directory name
-pcount.txtpattern for count file names
-n500number of genes to use for clustering
-tautocount transformation method; 'rld', 'vst', or 'auto'
-chcclustering algorithm to be passed to ConsensusClusterPlus
-lcompletemethod for hierarchical clustering to be passed to ConsensusClusterPlus
-hShow help message and exit

For example, one can type

Rscript RNAseq_unsupervised.R -f input -p count -t rld -o output/ -n 500

Details

The script involves 3 steps

Output

PCA

Clustering

Compare an unsupervised analysis with a list of variables: RNAseq_unsupervised_compare.R

This script compares the result of an unsupervised analyses (Principal Component Analysis and clustering) obtained for example using script RNAseq_unsupervised_compare.R with an arbitrary number of variables (categorical or continuous).

Prerequisites

This R script requires the following packages:

Usage

Rscript RNAseq_unsupervised_compare.R [options]
PARAMETERDEFAULTDESCRIPTION
-R..RData file with results from clustering in variable clusters and results from PCA in variable pca
-i.name of input file with variables in column and variable names as first line
-m2minimum number of clusters
-M5maximum number of clusters
-ooutoutput file preffix
-hShow help message and exit

For example, one can type

Rscript RNAseq_unsupervised_compare.R -R RNAseq_unsupervise.RData -i variables.txt -o output/

Details

For each clustering present in variable cluster, the script involves 3 steps

Output

For each column (i.e., variable) of the input table, a .pdf file with K rows, where K is the number of clusterings in variable cluster, and 3 columns:

Supervised analysis: RNAseq_supervised.R

This script performs supervised analyses (Differential Expression Analysis) at the gene level from htseq-count outputs.

Prerequisites

This R script requires the following package:

Depending on the options used, the following packages are also required:

Usage

Rscript RNAseq_supervised.R [options]
PARAMETERDEFAULTDESCRIPTION
-f.folder with count files
-g.file with sample groups
-ooutoutput directory name
-pcount.txtpattern for count file names
-c1number of cores for statistical computation
-q0.1False Discovery Rate
-mFALSEUse Independent Hypothesis Weighting for multiple-testing procedure
-hShow help message and exit

For example, one can type

Rscript RNAseq_supervised.R -f input -g groups.txt -o output/

Details

The script performs DE analysis of gene count data under a Poisson glm with package DESeq2. When multiple groups are present (e.g., A, B, and C), computes results for contrasts corresponding to all combinations of 2 groups (A vs B, A vs C, and B vs C).

Output

Supervised analysis: RNAseq_supervised_transcript.R

This script performs supervised analyses (Differential Expression Analysis) at the transcript level from StringTie outputs.

Prerequisites

This R script requires the following package:

Usage

Rscript RNAseq_supervised_transcript.R [options]
PARAMETERDEFAULTDESCRIPTION
-f.folder with folders of sample input files
-g.file with sample groups
-ooutoutput directory name
-p.pattern for input folder names
-t1Threshold variance in gene expression (FPKM)
-rRow names for group file
-c2Column index of covariable to use for regression; other columns are treated as adjustment variables
-hShow help message and exit

For example, one can type

Rscript RNAseq_supervised_transcript.R -f input -g groups.txt -o output/

Details

The script performs DE analysis of transcript FPKM data under a hierarchical linear model with package ballgown, optionally correcting for additional variables.

Output