Awesome
BABS-aDNASeq
A Nextflow pipeline script for processing aDNASeq samples.
The pipeline was written by The Bioinformatics & Biostatistics Group in collaboration with The Ancient Genomics Lab @ The Francis Crick Institute.
nextflow: http://www.nextflow.io nextflow-quickstart: http://www.nextflow.io/docs/latest/getstarted.html#get-started
The Analysis Pipeline Flow
Quick Start
To run an BABS-aDNASeq analysis you will need to complete the following steps. These are explained in more detail further down.
- Obtain BABS-aDNASeq files from GitHub.
- Install/load nextflow-0.32.0 or higher.
- Configure reference genome file paths (genome.yml).
- Configure environment profile if running software via a module system.
- Create a sample design file.
- Run nextflow pipeline.
Get BABS-aDNASeq Files
To obtain BABS-aDNASeq files run the following git command.
git clone https://github.com/crickbabs/BABS-aDNASeq
BABS-aDNASeq.nf The Nextflow script. BABS-aDNASeq Wrapper script to run an analysis. nextflow.config Main BABS-aDNASeq config file. conf/babs_profile.config Profile configuration for running the script @ The Crick. conf/genomes.config Genomes configuration file for defining reference data. conf/multiqc_config.yml Multiqc configuration used to generate integrated QC report.
Load Nextflow Module
If you are working within a module environment such as that at The Crick, load the nextflow module.
module purge module load nextflow/0.30.2
Sample Design File
Fastq files are specified in a csv design file with the following columns.
column 1 : Individual ID
column 2 : Sequencing library ID
column 3 : full path to fastq file R1
column 4 : full path to fastq file R2
Running an aDNASeq-ByBABS Analysis
BABS-aDNASeq --outdir ./ --design design.csv --profile babs --genome hg19 --resume
Output Directories & Files
Flow Details
Merge fastqs with the same library ID
Adapter trimming with SeqPrep
Adapter trimming and paired-end overlap consensus building. Only the overlap is saved here. Non-overlapping read-pairs are discarded.
https://github.com/jstjohn/SeqPrep
BWA
Consensus overlaps are aligned to the specified reference using BWA. BAM files with read groups are created.
Duplicate Removal
Duplicate alignments are removed using Picard.
Variant Calling
VCFs are created using samtools mpileup. QC metrics are produced using bcftools stats.
Consensus Fasta
Ambiguity encoded consensus fasta files are produced using vcftools consensus.
Random Fasta
Random allele fasta files are produced using htsbox pileup -R.
Merge BAM files
BAM files from the same individual are merged using samtools merge. Varient calling and QC ae carried out at both the library and individual level.
QC
Alignment QC is assessed using pmdtools and CollectWgsMetrics, CollectWgsMetricsWithNonZeroCoverage, CollectOxoGMetrics & CollectAlignmentSummaryMetrics from Picard. A QC report is generated using multiqc.
https://github.com/pontussk/PMDtools
https://github.com/broadinstitute/picard
https://github.com/ewels/MultiQC
Credits
The BABS-aDNASeq nextflow pipeline was written and developed by Philip East & Pontus Skoglund.
The Bioinformatics & Biostatistics Group (BABS) @ The Francis Crick Institute. Ancient Genomics @ The Francis Crick Institute.
Licence
This project is licensed under the MIT License - see the LICENSE.md file for details.