Awesome

This is a @galaxyproject's modified fork of the original SegAlign.

Overview
Installation
- Dependencies
Usage
Citing KegAlign

<a name="overview"></a> Overview

Precise genome aligner efficiently leveraging GPUs.

<a name="changes"></a> Changes from the original implementation

Added advanced runner script allowing the usage of MIG and/or MPS for better GPU utilization
Updated to compile with TBB (Threading Building Blocks) version 2020.2
Fixed the --scoring option. It can now read and use the substitution matrix from a LASTZ Scoring File
Added --num_threads option to limit the number of threads used
Added --segment_size option to limit maximum number of HSPs per segment file for CPU load balancing
Cleaned up build files and addressed compiler warnings

<a name="installation"></a> Installation

For standalone installation use Conda: conda install conda-forge::kegalign

For standalone installation with additional tools use Bioconda: conda install bioconda::kegalign-full

For installation in Galaxy we currently use the wrappers richard-burhans:kegalign and richard-burhans:batched_lastz from the Main Tool Shed. Try the tools at usegalaxy.org: kegalign, batched_lastz

Script to create conda environment

git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash
source ./conda-env.bash

Script to install development enviroment

git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash -dev
source ./conda-env-dev.bash

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

<a name="dependencies"></a> Dependencies

The following dependencies are required by KegAlign:

CMake >= 3.8
oneAPI Threading Building Blocks (oneTBB) 2020.2
Boost C++ Libraries >= 1.70
LASTZ 1.04.22
faToTwoBit (from UCSC Genome Browser source)

<a name="usage"></a> Usage

<a name="alignment"></a> Alignment

Running a Sample Alignment

# install kegalign
git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash
source ./conda-env.bash

# convert target (ref) and query to 2bit
mkdir work
faToTwoBit <(gzip -cdfq ./test-data/apple.fasta.gz) work/ref.2bit
faToTwoBit <(gzip -cdfq ./test-data/orange.fasta.gz) work/query.2bit

# generate LASTZ keg
python ./scripts/runner.py --diagonal-partition --format maf- --num-cpu 16 --num-gpu 1 --output-file data_package.tgz --output-type tarball --tool_directory ./scripts test-data/apple.fasta.gz test-data/orange.fasta.gz
python ./scripts/package_output.py --format_selector maf --tool_directory ./scripts

# run LASTZ keg
python ./scripts/run_lastz_tarball.py --input=data_package.tgz --output=apple_orange.maf --parallel=16

# check output
diff apple_orange.maf <(gzip -cdfq ./test-data/apple_orange.maf.gz)

# command-line kegalign
kegalign test-data/apple.fasta.gz test-data/orange.fasta.gz work/ --num_gpu 1 --num_threads 16 > lastz-commands.txt
bash lastz-commands.txt
(echo "##maf version=1"; cat *.maf-) > apple_orange.maf

Running with MIG/MPS

GPU utilization can be increased by using MIG and/or MPS, leading up to 20% faster alignments.

Preparing inputs

With the provided split_input.py script we assign individual chromosomes from the input genome into separate fasta files (up to --max_chunks), each with roughly --goal_bp number of base pairs, which will then be run in parallel on the same GPU(s). Since individual chromosomes are not split, the --goal_bp parameter should not be significantly smaller than the largest chromosome in the input file to ensure similar sized chunks. A good --goal_bp size for the human genome is 200 million base pairs.

mkdir query_split target_split
./scripts/mps-mig/split_input.py --input ./test-data/apple.fasta.gz --out query_split --to_2bit --goal_bp 20000000 --max_chunks 30
./scripts/mps-mig/split_input.py --input ./test-data/orange.fasta.gz --out target_split --to_2bit --goal_bp 20000000 --max_chunks 30
mkdir tmp

Select GPU UUIDs to run on using

nvidia-smi -L

run on two GPUs with 4 MPS processes per GPU (replace [GPU-UUID#] with outputs from above command)

Each KegAlign instance, with default settings, uses around 12 to 16 GiB of GPU memory. The chosen GPUs or MIG instances should each have enough GPU memory to run the number of KegAlign instances defined by the --MPS parameter.

python ./scripts/mps-mig/run_mig.py [GPU-UUID1],[GPU-UUID2] --MPS 4 --target ./target_split --query ./query_split  --tmp_dir ./tmp/ --mps_pipe_dir ./tmp/ --output ./apples_oranges.maf --num_threads 64

<a name="scoring"></a>Scoring Options

By default the HOXD70 substitution scores are used (from Chiaromonte et al. 2002)

bad_score          = X:-1000  # used for sub['X'][*] and sub[*]['X']
fill_score         = -100     # used when sub[*][*] is not defined
gap_open_penalty   =  400
gap_extend_penalty =   30

     A     C     G     T
A   91  -114   -31  -123
C -114   100  -125   -31
G  -31  -125   100  -114
T -123   -31  -114    91

Matrix can be supplied as an input to --scoring parameter. Substitution matrix can be inferred from your data using another LASTZ-based tool (LASTZ_D: Infer substitution scores).

<a name="output"></a>Output Options

The default output is a MAF alignment file. Other formats can be selected with the --format parameter. See LASTZ manual for description of possible formats.

<a name="cite_kegalign"></a> Citing KegAlign

B Gulhan, R Burhans, R Harris, M Kandemir, M Haeussler, A Nekrutenko. KegAlign: Optimizing pairwise alignments with diagonal partitioning. BIORXIV, 2024. doi: 10.1101/2024.09.02.610839