Home

Awesome

abra-nf

Nextflow pipeline for ABRA2 (Assembly Based ReAligner)

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Description

Apply ABRA2 to realign next generation sequencing data using localized assembly in a set of BAM files.

This scripts takes a set of BAM files (called *.bam) grouped folders as an input. There are two modes:

In all cases BAI indexes have to be present in the same location than their BAM mates and called *.bam.bai.

Note that ABRA v1 is no longer supported (see the last version supporting it here: https://github.com/IARCbioinfo/abra-nf/releases/tag/v1.0)

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.

  2. External software:

A conda receipe, and docker and singularity containers are available with all the tools needed to run the pipeline (see "Usage" and the IARC-nf repository for more information.)

Input

NameDescription
--tumor_bam_folderFolder containing tumor BAM files
--normal_bam_folderFolder containing matched normal BAM files
--suffix_tumorSuffix identifying tumor bam (default: _T)
--suffix_normalSuffix identifying normal bam (default: _N)
NameDescription
--bam_folderFolder containing BAM files

Parameters

NameExample valueDescription
--ref/path/to/ref.fastaReference fasta file indexed
--abra_path/path/to/abra2.jarabra.jar explicit path (not needed if you use docker or singularity container)
NameDefault valueDescription
--bed/path/to/intervals.bedBed file containing intervals (without header)
--gtf/path/to/annotations.gtfGTF file containing junction annotations
--mem16Maximum RAM used
--cpu4Number of threads used
--output_folderabra_BAM/Output folder containing the realigned BAM

Flags are special parameters without value.

NameDescription
--helpDisplay help
--singleSwitch to single-end sequencing mode
--rnaAdd RNA-specific recommended ABRA2 parameters
--junctionsUse STAR identified junctions

Usage

Simple use case example:

nextflow run iarcbioinfo/abra-nf --bam_folder BAM/ --bed target.bed --ref ref.fasta --abra_path /path/to/abra.jar

With singularity:

nextflow run iarcbioinfo/abra-nf -profile singularity --bam_folder BAM/ --bed target.bed --ref ref.fasta --abra_path /path/to/abra.jar

Alternatively, one can run the pipeline using a docker container (-profile docker) or the conda receipe containing all required dependencies (-profile conda).

Output

TypeDescription
ABRA BAMRealigned BAM files with their indexes

Contributions

NameEmailDescription
Matthieu Foll*follm@iarc.frDeveloper to contact for support
Nicolas Alcalaalcalan@fellows.iarc.frDeveloper

FAQ

A few samples always crash with error exit status 130, causing all processes to be stopped by nextflow. What can I do about it?

ABRA memory use has a large variance, often resulting in a few bam files unpredictably requiring much more memory than others, and causing a memory error (exit code 130). Because pipeline ABRA-nf involves a single process that is executed in parallel across all bam files, results for each sample (or Tumor-Normal pair) are independent, and it is recommended to use the nextflow option (e.g., in the nextflow.config file):

process.errorStrategy = 'ignore'

so that files that cause an error do not stop all other processes that would have been processed just fine. ABRA-nf can then be launched again with more memory (option --mem) for the files that failed.  

An other possibility is to automatically relaunch individual crashed process with more memory, with something like this in the config file:

process {
     $abra {
           memory = { task.exitStatus == 130 ? 8.GB * task.attempt : 8.GB }
           errorStrategy = { task.exitStatus == 130 ? 'retry' : 'ignore' }
           maxRetries = 4
      }
}

Here we ask Nextflow to try first with 8GB of memory, and if it crashed due to memory (exitcode 130 in this example, but note that this error code is specific to the scheduler used), it will retry with 16GB, then 24GB etc. until 4 maximum retries. If ABRA crashes for another reason the error is ignored.