Awesome
Introduction
This is a tiny python script to generate MAF files from output generated by stadard annotation programs. Currently, annovar - table_annovar.pl output and bcftools csq outputs can be converted to maf.
$ python annovar2maf.py -h
usage: annovar2maf [-h] [-t TSB] [-b BUILD] [-p {refGene,ensGene}] [-c] input
Convert annovar and bcftools-csq annotations to MAF
positional arguments:
input Annovar anotations file [Ex: myanno.hg19_multianno.txt] or a csq formatted file.
optional arguments:
-h, --help show this help message and exit
-t TSB, --tsb TSB Sample name. Default parses from the file name
-b BUILD, --build BUILD
Reference genome build [Default: hg38]
-p {refGene,ensGene}, --protocol {refGene,ensGene}
Protocol used to generate annovar annotations [Default: refGene]
-c, --csq Input file is a bcftools csq formatted output
annovar2maf
python annovar2maf.py -t foo -b GRCh37 tests/test_mutect.refseq.hg19_multianno.txt
# For annovar annotations generated with ensGene as a protocol
python annovar2maf.py -p ensGene -t foo -b GRCh37 tests/test_mutect.ens.hg19_multianno.txt
csq2maf
Similar to VEP, bcftools csq
command can annotate variants with consequences. The program is lightweight and extremely fast
Output can be converted to tsv with split-vep and then converted to MAF.
ref="Homo_sapiens.GRCh37.dna.primary_assembly.fa"
# Get the GFF files for your ref build
## GRCh38 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.gff3.gz
## GRCh37 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.chr.gff3.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.gff3.gz
## Step-1: Below commands left normalizes the VCF, splits multi-alleleic variants, annotates vcf with variant consequences while prioritizing variants with worst consequences.
bcftools norm -f ${ref} -m -both -Oz tests/test_mutect.vcf.gz | bcftools csq -c CSQ -f ${ref} -g Homo_sapiens.GRCh37.82.gff3.gz -p a | \
bcftools +split-vep /dev/stdin -Oz -o tests/test_mutect.csq.vcf.gz -c - -s worst
## Step-2: Below command converts csq annotated vcf to tsv
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%gene\t%transcript\t%Consequence\t%amino_acid_change\t%dna_change\n' tests/test_mutect.csq.vcf.gz > tests/test_mutect.csq.tsv
## Step-3: Now Covert tsv to maf
python annovar2maf.py -c -t foo -b GRCh37 tests/test_mutect.csq.tsv