Home

Awesome

This tool is deprecated!

Albacore (Oxford Nanopore's basecaller) can basecall directly to FASTQ, which makes FAST5 to FASTQ conversion much less relevant. Also, I have since written Filtlong, which does more sophisticated long read filtering than these scripts.

The current recommendation is therefore to basecall directly to FASTQ (or use some other tool to extract FASTQ reads from FAST5 files), trim with Porechop and then filter with Filtlong. If you're still interested in using these scripts, the original README follows below:

FAST5 to FASTQ

This is a simple script to extract FASTQ files from FAST5 files.

There are a number of other tools which can do this, including Poretools, PoRe, nanopolish extract and more. I made this one for a couple of specific features:

UPDATE (22 May 2017): Since Albacore v1.1, direct to FASTQ basecalling is possible (yay!). I therefore made a version of this script which takes a FASTQ input instead of a FAST5 directory so you can perform the length/quality filters if you did straight-to-FASTQ basecalling. More info here.

Requirements

Installation

No installation is required - it's all just in one Python script:

git clone https://github.com/rrwick/Fast5-to-Fastq
Fast5-to-Fastq/fast5_to_fastq.py --help

Usage

Extracting all reads from FAST5 to FASTQ:

Gzip while you extract:

Filter based on length:

Filter based on mean Phred quality score:

Filter based on min Phred score over a sliding window:

Aim for a target number of bases:

How I (Ryan) like to use it:

FASTQ filtering

The fastq_to_fastq.py script has the same usage as fast5_to_fastq.py, just replace path/to/fast5_directory with path/to/reads.fastq. For example:

Both *.fastq and *.fastq.gz should work as input formats.

FAST5 integrity check

I ran into some annoying crashes caused by corrupt FAST5 files, so I made the fast5_integrity_check.py tool to find these.

It only takes one argument: the directory to check (searched recursively):<br> fast5_integrity_check.py path/to/fast5_directory

It prints the name and path for bad FAST5 files to stdout and some progress info to stderr.

License

GNU General Public License, version 3