Home

Awesome

fastqp

Build Status PyPI

Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Features

Requirements

Tested on Python 2.7, and 3.4

Tested on Mac OS 10.10 and Linux 2.6.18

Installation

pip install [--user] fastqp

Note: BAM file support requires samtools

Usage

usage: fastqp [-h] [-q] [-s BINSIZE] [-a NAME] [-n NREADS] [-p BASE_PROBS] [-k {2,3,4,5,6,7}] [-o OUTPUT]
              [-ll LEFTLIMIT] [-rl RIGHTLIMIT] [-mq MEDIAN_QUAL] [--aligned-only | --unaligned-only] [-d]
              input

simple NGS read quality assessment using Python

positional arguments:
  input                 input file (one of .sam, .bam, .fq, or .fastq(.gz) or stdin (-))

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           do not print any messages (default: False)
  -s BINSIZE, --binsize BINSIZE
                        number of reads to bin for sampling (default: auto)
  -a NAME, --name NAME  sample name identifier for text and graphics output (default: input file name)
  -n NREADS, --nreads NREADS
                        number of reads sample from input (default: 2000000)
  -p BASE_PROBS, --base-probs BASE_PROBS
                        probabilites for observing A,T,C,G,N in reads (default: 0.25,0.25,0.25,0.25,0.1)
  -k {2,3,4,5,6,7}, --kmer {2,3,4,5,6,7}
                        length of kmer for over-repesented kmer counts (default: 5)
  -o OUTPUT, --output OUTPUT
                        base name for output files (default: fastqp_figures)
  -ll LEFTLIMIT, --leftlimit LEFTLIMIT
                        leftmost cycle limit (default: 1)
  -rl RIGHTLIMIT, --rightlimit RIGHTLIMIT
                        rightmost cycle limit (-1 for none) (default: -1)
  -mq MEDIAN_QUAL, --median-qual MEDIAN_QUAL
                        median quality threshold for failing QC (default: 30)
  --aligned-only        only aligned reads (default: False)
  --unaligned-only      only unaligned reads (default: False)
  -d, --count-duplicates
                        calculate sequence duplication rate (default: False)

Changes

See releases page for details.

Examples

quality heatmap

gc plot

gc distribution

nucleotide plot

nucleotide mismatch plot

kmer distribution

depth plot

quality percentiles

quality distribution

adapter kmer distribution

Acknowledgements

This project is freely licensed by the author, Matthew Shirley, and was completed under the mentorship financial support of Drs. Sarah Wheelan and Vasan Yegnasubramanian at the Sidney Kimmel Comprehensive Cancer Center in the Department of Oncology.