Awesome
Gviz_multiAlignments
R script for multiple BAM alignments viewing using Gviz (bioconductor package)
Parameters
-
Mandatory
Name | Default value | Description |
---|---|---|
bam_folder | - | Folder containing all BAMs needed |
pos_file | - | File containing the position and samples to consider for the plot |
ref | - | A reference fasta file |
genome_release | - | Genome release needed for the annotations |
-
Optional
Name | Default value | Description |
---|---|---|
sample_names | FILE | Set this argument to "SAMPLE" if the input file contains the samples names extracted from the BAM files and not the BAM files names |
Example of an input file (pos_file argument)
chr17 | 7572814 | SampleName1 or BAM_file1 | |
chr17 | 7572814 | SampleName1 or BAM_file1 | SampleName2 or BAM_file2 |
chr17 | 7579643 | SampleName1 or BAM_file1 | SampleName2 or BAM_file2 |
Using this input file, the R script will generate 3 pdf files, each pdf file containing the alignment of the BAM file(s) at each position.
Usage
Rscript script_gviz.r --pos_file=file_name.txt --bam_folder=/path_to_BAMs/ --ref=fasta_file.fa --genome_release=Hsapiens.UCSC.hg19 --sample_names=SAMPLE
Detailed description
Bioconductor packages to install for plotting the alignments :
- The Gviz package
- An Annotation package for TxDb objects.
- A Genome wide annotation, it contains mappings between Entrez Gene identifiers and GenBank accession numbers. Examples: the Genome wide annotation package for Human:
org.Hs.eg.db
and for the mouse:org.Mm.eg.db
.
Example: If the hg19 release of the human genome is used, the following packages should be installed: Gviz
, TxDb.Hsapiens.UCSC.hg19.knownGene
(hg18 and hg38 UCSC version can also be used) and org.Hs.eg.db
These packages exist for other organisms than Human but have not been tested. One can for example generate the alignments plot for mouse data by installing TxDb.Mmusculus.UCSC.mm10.knownGene
(mm9 UCSC version can also be used) and org.Mm.eg.db
. For the other organisms the packages need to have the same nomenclature as the ones listed above.
The --genome_release
option needs to be provided and corresponds to the TxDb annotation package name without its prefix and suffix. For the hg19 release of the human genome, one needs to set --genome_release
to Hsapiens.UCSC.hg19
.
Note that the packages chosen for the annotations are compatible with the UCSC notations since most of the Gviz fonctionalities can handle these notations. The reference genome used for the BAM alignments can be based on GENCODE, UCSC or ENSEMBL genome varieties.
The alignment plot from top to bottom :
- Chromosome representation : a red vertical line shows the position of the variant
- Genomic axis associated with the alignment.
- BAM alignment(s) (50 bases on both sides of the variant) : the variant position is highlighted in red.
- The reference genome : the variant position is highlighted in red.
- Genome annotation : the yellow blocks represent exons, the variant position is highlighted in red. The annotation is represented only if the variant is not in an intergenic region.
- Zoom-out on the genome annotation : representation of the whole gene whose name is on the right side of the gene annotation. The annotation is represented only if the variant is not in an intergenic region. A red vertical line shows the position of the variant.
- Genomic axis corresponding to the previous genome annotation. A red vertical line shows the position of the variant.