Home

Awesome

VIGOR - Viral Genome ORF Reader

VIGOR4 (Viral Genome ORF Reader) is a Java application to predict protein sequences encoded in viral genomes.<br> VIGOR4 determines the protein coding sequences by sequence similarity searching against curated viral protein databases.<br>

This project is funded by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number U19AI110819 and Contract Number HHSN272201400028C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute, and Vecna Technologies.

Annotatable Viruses

Vigor4 uses the VIGOR_DB project which currently has databases for the following viruses:

Installing VIGOR4

Quick Start

mvn -DskipTests clean package
unzip target/vigor-4.0.VERSION.zip -d INSTALL_DIRECTORY
# set reference database path, exonerate path and temporary directory
vi INSTALL_DIRECTORY/vigor-4.0.VERSION/config/vigor.ini
INSTALL_DIRECTORY/vigor-4.0.VERSION/bin/vigor4 -i VIRUS.fasta -o OUTPUT_DIRECTORY -d VIRUS_DB

Build Dependencies

Vigor4 uses 'Maven' to build & package the program <br>

Runtime Dependencies

VIGOR4 runs on Linux environment <br> JAVA8 or above<br> Exonerate2.2.0<br> Vigor Viral database (Available at VIGOR_DB)

Refer to INSTALL.md for detailed instructions

Running VIGOR4

usage: vigor4 -i inputfasta -o outputprefix -d refdb

Named Arguments:

  -h, --help             show this help message and exit
  -i <input fasta>, --input-fasta <input fasta>
                         path to fasta  file  of  genomic  sequences  to be
                         annotated
  -o <output prefix>, --output-prefix <output prefix>
                         prefix for outputfile  files,  e.g.  if  the output
                         prefix is  /mydir/anno  VIGOR  will  create output
                         files /mydir/anno.tbl, /mydir/anno.stats, etc.
  -c MIN_COVERAGE, --min-coverage MIN_COVERAGE
                         minimum  coverage  of  reference  product  (0-100)
                         required to report a gene,  by default coverage is
                         ignored
  -P <parameter=value~~...~~parameter=value>, --parameter <parameter=value~~...~~parameter=value>
                         ~~ separated list of  VIGOR parameters to override
                         default values.  Use  --list-config-parameters  to
                         see settable parameters.
  -v, --verbose          verbose logging (default=terse)
  --list-config-parameters
                         list available configuration parameters and exit
  --config-file CONFIG_FILE
                         config file to use
  --reference-database-path REFERENCE_DATABASE_PATH
                         reference database path
  --virus-config VIRUSSPECIFICCONFIG
                         Path to virus specific configuration
  --virus-config-path VIRUSSPECIFICCONFIGPATH
                         Path  to  directory   containing   virus  specific
                         config files.
  --overwrite-output     overwrite existing output files if they exist
  --temporary-directory TEMPORARYDIRECTORY
                         Root directory to use for temporary directories

reference database:
  -d <ref db>, --reference-database <ref db>
                         specify the reference database to  be used, (-D is
                         a synonym for this option)
  
locus tag usage:
  -l, --no-locus-tags    do  NOT  use   locus_tags   in   TBL  file  output
                         (incompatible with -L)
  -L [<locus_tag_prefix>], --locus-tags [<locus_tag_prefix>]
                         USE locus_tags in  TBL  file  output (incompatible
                         with -l). If  no  prefix  is  provided, the prefix
                         "vigor_" will be used.

Outputs:

 outputprefix.rpt   -  summary of program results
 outputprefix.cds   -  fasta file of predicted CDSs
 outputprefix.pep   -  fasta file of predicted proteins
 outputprefix.tbl   -  predicted features in GenBank tbl format
 outputprefix.aln   -  alignment  of  predicted  protein  to  reference, and
                       reference protein to genome
 outputprefix.gff3  -  predicted features in GFF3 format

Currently unimplemented VIGOR3 Command Line Options:

  -0, --circular         complete circular  genome  (allows  gene  to  span
                         origin). This feature is currently unimplemented
  -f {0,1,2}, --frameshift-sensitivity {0,1,2}
                         frameshift  sensitivity,   0=ignore   frameshifts,
                         1=normal (default), 2=sensitive. 
  -m, --ignore-reference-requirements
                         ignore      reference      match      requirements
                         (coverage/identity/similarity),  sometimes  useful
                         when running VIGOR  to  evaluate  raw  contigs and
                         rough draft sequences
  -x <ref_id,...,ref_id>, --ignore-refID <ref_id,...,ref_id>
                         comma separated list of  reference sequence IDs to
                         ignore  (useful   when   debugging   a   reference
                         database). Not currently implemented

Reference Databases:

NameDescription
fluaInfluenza A
flubInfluenza B
flucInfluenza C
lassaLassa Mammarenavirus
rsvRespiratory syntactical virus (RSV)
rtvaRotavirus A
rtvbRotavirus B
rtvcRotavirus C
rtvfRotavirus F
rtvgRotavirus G
sapoSapovirus
veevAlphaviruses (VEEV/EEEV)
wnvIWest Nile Virus - Lineage I
wnvIIWest Nile Virus - Lineage II
zikvZika virus