Home

Awesome

Fasten

Crates.io CI DOI

A powerful manipulation suite for interleaved fastq files. Executables can read/write to stdin and stdout, and they are compatible with the interleaved fastq format. This makes it much easier to perform streaming operations using unix pipes.

Synopsis

read metrics

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875

read cleaning

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | \
    fasten_clean --paired-end --min-length 2 | \
    gzip -c > cleaned.shuffled.fastq.gz

$ zcat cleaned.shuffled.fastq.gz | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875
# No reads were actually filtered with cleaning, with --min-length=2

Installation

Installation from source

Fasten is programmed in the Rust programming language. More information about Rust, including installation and the executable cargo, can be found at rust-lang.org.

After downloading, use the Rust executable cargo like so:

cd fasten
cargo build --release
export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory fasten/target/release.

note: there are some Makefile methods to help including

Installation without git

You can also install Fasten straight from https://crates.io using the following command:

cargo install fasten

Detailed information on how this works can be found in the cargo handbook at https://doc.rust-lang.org/cargo/commands/cargo-install.html.

General usage

All scripts accept the parameters, read uncompressed fastq format from stdin, and print uncompressed fastq format to stdout. All paired end fastq files must be in interleaved format, and they are written in interleaved format, except when deshuffling with fasten_shuffle.

Documentation

Please see the inline documentation at https://lskatz.github.io/fasten/fasten

This documentation was built with cargo doc --no-deps

Other documentation

Contributing

Instructions for how to contribute can be found in CONTRIBUTING.md.

Fasten script descriptions

All executables read and write in the fastq format except fasten_convert.

executableDescription
fasten_cleanTrims and cleans a fastq file.
fasten_convertConverts between different sequence formats like fastq, sam, fasta.
fasten_straightenConvert any fastq file to a standard four-line-per-entry format.
fasten_metricsPrints basic read metrics.
fasten_peDetermines paired-endedness based on read IDs.
fasten_randomizeRandomizes reads from input
fasten_combineCombines identical reads and updates quality scores.
fasten_kmerKmer counting.
fasten_normalizeNormalize read depth by using kmer counting.
fasten_sampleDownsamples reads.
fasten_shuffleShuffles or deshuffles paired end reads.
fasten_validateValidates your reads (deprecated in favor of fasten_inspect and fasten_repair
fasten_inspectadds information to read IDs such as seqlength
fasten_repairRepairs corrupted reads
fasten_quality_filterTransforms nucleotides to "N" if the quality is low
fasten_trimBlunt-end trims reads
fasten_replaceFind and replace using regex
fasten_mutateintroduce random mutations
fasten_regexFilter for reads using regex
fasten_progressAdd progress to any place in the pipeline
fasten_sortSort fastq entries

Etymology

Many of these scripts have inspiration from the fastx toolkit, and I wanted to make a fasty which was already the name of a bioinformatics program. Therefore I cycled through other letters of the alphabet and came across "N." So it is possible to pronounce this project like "Fast-N" or in a way that indicates that you are securing your analysis by "fasten"ing it (with a silent T).

Citation

DOI

To cite, please refer to Katz et al., (2024). Fasten: a toolkit for streaming operations on fastq files. Journal of Open Source Software, 9(94), 6030, https://doi.org/10.21105/joss.06030