Awesome
LncPipeReporter
An R package for automatically aggregating and summarizing lncRNA analysis results.
Overview
Most of bioinformatics tools, such as aligners like STAR, TopHat and HISAT2 generate log files by default. A lastest nextflow-based lncRNA sequenceing data analysis pipeline, known as LncPipe, produces a file containing lncRNA basic features.
This project is a part of LncPipe (but can also be used solely) that take charge of automatically generating reports in HTML
format with interactive plots based on pipeline output. It contains several ploting functions as well as analysis scripts to perform comparison analysis and differential expression analysis when experimental design information was available. We speculated this tools can facilitate understanding the underlining machanism of known and novel lncRNAs in their experiment.
Gallery
Gif animations were recorded using phw/peek.
LncPipeReporter generated interactive plots support arbitrary scaling, filtering with tags refer to real value implemented via plotly.
There are also interactive tables exhibiting the first 80 lines of the data.frame
/data.table
, which could be exported as many forms, allowing for searching, filtering and ordering.
The user-adjusted plots can always be saved as static figures, then could be temporarily placed in your manuscripts for peer-review. Once time comes to publication, you may use publish-deserved version instead.
Features
-
Common result files in lncRNA sequencing data analysis pipeline are well suppoted. The package is designed to handle with several types of files (click to see the example file content):
-
File can be found anywhere. Users can put all up-stream analysis result files simply in a folder (even with other files). They will be found out recursively from the folder and its subdirectories.
-
File types can be guessed. Users never need designate file types explicitly or even send a file containing name list as a paramter when use LncPipe reporter.
-
Flexible use. User can send arbitrary type or number of files at a time, for instance, more than one STAR log files, or both STAR and HISAT2 log files, or even without any alignment log files.
-
More themes available. Users can apply for a series of pretty theme brought by ggsci. See Parameters for details.
-
Multiple differential expression analysis method supported. Up to now, users can choose one of edgeR, DESeq2 or NOISeq as differential expression analysis tool.
-
High resolution static figures with detailed results in csv is provided. Users will get figures which can be used for publication in tiff format (with 300 ppi resolution and lzw compression performed) and pdf format (could be modified in AI, etc.). Also, LncPipeReporter always brings you analysis result tables (comma-separated, can be opened/edited by MS Excel, etc.), for details, see Results.
Installation
LncPipeReporter currently only support Unix-like operation system.
Because it contains several lines of Perl 5 one-liner for parsing multiple log files. I'll use pure R code instead in the future to make it a cross-platform package.
The main reporter Rmd file is constructed from Rmarkdown files of R Markdown v2 document, so you must install pandoc
first:
For Arch Linux:
$ sudo pacman -S pandoc
For other operation systems or Linux distributions, see pandoc's official documentation.
You can't build from source in Microsoft-R-Open early than v3.4.2, due to its bug.
For some packages need fortran
for compiling, you should install fortran compiler first:
$ sudo apt-get install gfortran
Run in R session:
install.packages("devtools")
devtools::install_github("bioinformatist/LncPipeReporter")
If there's any problem during installation, please refer to FAQ.
How to use
Caution: Though users never need specify file types, the sample name should be embedded in the first part (use both
.
and_
as file name delimiter) of file name's prefix, for example, the sample name of LWS2.Log.final.out and N1037.log will be obtained as LWS2 and N1037.
If you use DESeq2 or NOISeq as differentially expression analysis tool, the order of sample names in experimental design information file should be consistent with the expression matrix columns.
It is highly recommended that users should use Chrome web browser for looking through reports produced by LncPipeReporter.
Try the simplest run with default parameters
library(LncPipeReporter)
run_reporter()
Specify the parameter values with user-interface
library(LncPipeReporter)
# DO NOT use T as short name of TRUE
run_reporter(ask = TRUE)
Call with user-defined parameter values
library(LncPipeReporter)
run_reporter(input = system.file(file.path("extdata", "demo_results"),package = "LncPipeReporter"),
output = 'reporter.html',
theme = 'npg',
cdf.percent = 10,
max.lncrna.len = 10000,
min.expressed.sample = 50,
ask = FALSE)
Call in shell scripts or command line (Nextflow, etc.)
List the paramters with values as a R list
object:
$ Rscript -e "library(LncPipeReporter); run_reporter(input = '.', ...)"
...
stands for other arguments. You should use single-quotes here.
Parameters with their names and default values were listed below:
Parameters
<table style="width:57%;"> <colgroup> <col width="16%" /> <col width="20%" /> <col width="19%" /> </colgroup> <thead> <tr class="header"> <th>Name</th> <th>Default value</th> <th>Description</th> </tr> </thead> <tbody> <tr class="odd"> <td>input</td> <td><code>extdata/demo_results</code></td> <td>Absolute path of input directory (results of up-stream analysis)</td> </tr> <tr class="even"> <td>output</td> <td><code>~/reporter.html</code></td> <td>index file name (In HTML format)</td> </tr> <tr class="odd"> <td>output_dir</td> <td><code>~/LncPipeReports</code></td> <td>output directory (who holds all results and dependencies)</td> </tr> <tr class="even"> <td>de.method</td> <td>'edger'</td> <td>Differential expression analysis method, could be 'edger'(default), 'noiseq' or 'deseq2'</td> </tr> <tr class="odd"> <td>theme</td> <td><code>npg</code></td> <td>Journal palette applied to all plots supplied by <a href="https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html#discrete-color-palettes">ggsci</a></td> </tr> <tr class="even"> <td>cdf.percent</td> <td><code>10%</code></td> <td>Percentage of values to display when calculating coding potential</td> </tr> <tr class="odd"> <td>max.lncrna.len</td> <td><code>10000</code></td> <td>Maximum length of lncRNAs to display when calculating distribution</td> </tr> <tr class="even"> <td>min.expressed.sample</td> <td><code>50%</code></td> <td>Minimal percentage of expressed samples</td> </tr> <tr class="odd"> <td>ask</td> <td>FALSE</td> <td>need set parameters with graphical user-interface in browser?</td> </tr> </tbody> </table>For details and examples, please type help(run_reporter)
or ?run_reporter
in R session for documentation.
Results
By default, LncPipeReporter will generate a directory named as LncPipeReports
at your $HOME
(you can set another place yourself) that holds all results as well as dependencies, so you should always move/copy the whole folder. The contents of the output directory seems like:
LncPipeReports/
├── figures
│ ├── CDF.pdf
│ ├── CDF.tiff
│ ├── compare_density.pdf
│ ├── compare_density.tiff
│ ├── compare_violin.pdf
│ ├── compare_violin.tiff
│ ├── HISAT2.pdf
│ ├── HISAT2.tiff
│ ├── lncRNA_length_distribution.pdf
│ ├── lncRNA_length_distribution.tiff
│ ├── lncRNA_length_distribution_with_type.pdf
│ ├── lncRNA_length_distribution_with_type.tiff
│ ├── pca.pdf
│ ├── pca.tiff
│ ├── STAR.pdf
│ ├── STAR.tiff
│ ├── TopHat2.pdf
│ ├── TopHat2.tiff
│ ├── vocano.pdf
│ └── vocano.tiff
├── libs
│ ├── bootstrap-3.3.5
│ ├── crosstalk-1.0.0
│ ├── datatables-binding-0.2
│ ├── dt-core-1.10.12
│ ├── dt-ext-buttons-1.10.12
│ ├── dt-plugin-searchhighlight-1.10.12
│ ├── htmlwidgets-0.9
│ ├── ionicons-2.0.1
│ ├── jquery-1.12.4
│ ├── jszip-1.10.12
│ ├── pdfmake-1.10.12
│ ├── plotly-binding-4.7.1.9000
│ ├── plotlyjs-1.31.2.9000
│ ├── stickytableheaders-0.1.19
│ └── typedarray-0.1
├── reporter.html
└── tables
├── DE.csv
├── HISAT2.csv
├── STAR.csv
└── TopHat2.csv
18 directories, 25 files
This tree thumbnail is represented for output with differentially expression analysis via edgeR. The results from the other tools may be slightly different.
FAQ
If devtools::install_github()
raise Installation failed: Problem with the SSL CA cert (path? access rights?)
error, try:
install.packages(c("curl", "httr"))
During installation there may be some configuration error (lack of libraries):
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
* deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
* rpm: libcurl-devel (Fedora, CentOS, RHEL)
* csw: libcurl_dev (Solaris)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
Just follow the instruction to satisfy the dependencies. For instance, you can run sudo apt-get install libcurl4-openssl-dev
in Ubuntu to fix the problem above.
LncPipeReporter use Bioconductor package edgeR to perform differential expression analysis, so if you get
'BiocInstaller' must be installed to install Bioconductor packages.
, please choose1 (Yes)
. Since then you may seeInstallation failed: cannot open the connection to 'https://bioconductor.org/biocLite.R'
, runsource('http://bioconductor.org/biocLite.R')
, finally try the installation commands above again.
Please wait for minutes then try again if solving some dependencies from GitHub fails with
Connection timed out after 100001 milliseconds
.
License
This package is free and open source software, licensed under GPL v3.0.