Home

Awesome

Haybaler

Sophia Poertner, Colin Davenport, Lisa Hollstein 2020-2022

Details

Installation of haybaler only via conda

First install miniconda if you have not already done this. Use mamba instead of conda if you like faster installs (follow the mamba install instructions here https://github.com/mamba-org/mamba ) Required libs are listed in the file env.haybaler.yml Note - further R packages and an R installation are needed for the graphical heatmaps and heat trees, see the relevant sections below.

# first clone haybaler
git clone https://github.com/MHH-RCUG/haybaler
cd haybaler
# Set up a dedicated environment for haybaler
conda env create -f env.haybaler.yml
# OR mamba
mamba env create -f env.haybaler.yml
conda activate haybaler

Usage


# Method 1: Copy the scripts haybaler.py, run_haybaler.sh and all the files you want to combine  into one directory. 
# Then run the run_haybaler script 
bash run_haybaler.sh

# Method 2: Configure your haybaler location and run the wochenende_postprocess.sh script from the Wochenende project
# This will run haybaler as one step

Set lower abundance thresholds, change readcount_limit or RPMM_limit from defaults >10 and >300

# example with readcount_limit 20 and rpmm_limit 200
python3 haybaler.py -i "$input_files" -p . -op $outputDir  -o haybaler.csv --readcount_limit 20 --rpmm_limit 200

Output

The output is in the created output folder (default: haybaler_output)

Output is a set of CSVs. These combine the results from the original files into a single matrix, so you can better compare your samples. Furthermore, heatmaps and heattrees are created, provided you have an R installation set up correctly.

Refining the output and visualization

You can read the output into R for example and do further analyses yourself, or use our heatmap and heattree scripts. Heatmaps and Heattrees are also generated with the nf_wochenende.nf pipeline, provided you have the correct R libaries installed.

Installing R packages for heatmaps and heat-trees (eg in Rstudio)

# Install heatmap packages in R
packages = c("heatmaply", "RColorBrewer")

# install uninstalled packages
not_installed <- packages[!(packages %in% installed.packages()[ , "Package"])]  # Extract not installed packages
if(length(not_installed)) install.packages(not_installed, repos="http://cran.rstudio.com/")  # Install not installed packages from cran 

#load packages
invisible(lapply(packages, library, character.only = TRUE))


# Install heat tree packages in R

packages = c("metacoder", "taxa", "dplyr", "tibble", "ggplot2","stringr","RColorBrewer")

# install uninstalled packages
not_installed <- packages[!(packages %in% installed.packages()[ , "Package"])]  # Extract not installed packages
if(length(not_installed)) install.packages(not_installed, repos="http://cran.rstudio.com/")  # Install not installed packages from cran 

#load packages
invisible(lapply(packages, library, character.only = TRUE))

Heatmaps

# go to the haybaler output file
# Now run the Rscript to create a heatmap, this requires an R installation.
# Because of a bug with heatmaply and conda, no conda enviroment is allowed to be activated
conda deactivate
bash runbatch_heatmaps.sh 

Heattrees

# copy the taxonomy script in the haybaler_output_directory
cd haybaler_output
cp ../haybaler_taxonomy.py ../run_haybaler_tax.sh .

# run the scripts. Needs Haybaler enviroment and R installation 
conda activate haybaler
bash run_haybaler_tax.sh

Output:

How to run Heat trees

# copy the heattree scripts in the haybaler_output_directory
cd haybaler_output
cp ../create_heattrees.R ../run_heattrees.sh .

# run the scripts. Needs R installation 
bash run_heattrees.sh

How to change the level (genus or species) for the heattrees

# go to line 31
## column with the lineage information. Uncomment one
# wanted_column <- "genus_lineage"
wanted_column <- "species_lineage"

change node/edge/label sizes

# go to line 44
# Aesthetic arguments for Heat tree creation(size or text), change as you like 
# for more infomation: https://rdrr.io/cran/metacoder/man/heat_tree.html
node_size_range <- c(0.01, 0.05)
edge_size_range <- c(0.005, 0.025)
node_color_axis_label <- "absolute abundance in\n sample (color)"
node_size_axis_label <- "\nabsolute abundance among\n all samples (size)"
title_size <- 0.05 

Contributions

@poer-sophia - main author, taxonomy, heatmaps, testing

@colindaven - concept, code review, testing

@irosenboom heat-trees, testing

@LisaHollstein Code improvements, maintainence, testing, code review, docs