Awesome
<img src="kegalign_logo.png" width="300">This is a @galaxyproject's modified fork of the original SegAlign.
Table of Contents
<a name="overview"></a> Overview
Precise genome aligner efficiently leveraging GPUs.
<a name="changes"></a> Changes from the original implementation
- Added advanced runner script allowing the usage of MIG and/or MPS for better GPU utilization
- Updated to compile with TBB (Threading Building Blocks) version 2020.2
- Fixed the --scoring option. It can now read and use the substitution matrix from a LASTZ Scoring File
- Added --num_threads option to limit the number of threads used
- Added --segment_size option to limit maximum number of HSPs per segment file for CPU load balancing
- Cleaned up build files and addressed compiler warnings
<a name="installation"></a> Installation
For standalone installation use Conda: conda install conda-forge::kegalign
For standalone installation with additional tools use Bioconda: conda install bioconda::kegalign-full
For installation in Galaxy we currently use the wrappers richard-burhans:kegalign
and richard-burhans:batched_lastz
from the Main Tool Shed.
Try the tools at usegalaxy.org: kegalign, batched_lastz
- Script to create conda environment
git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash
source ./conda-env.bash
- Script to install development enviroment
git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash -dev
source ./conda-env-dev.bash
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
<a name="dependencies"></a> Dependencies
The following dependencies are required by KegAlign:
- CMake >= 3.8
- oneAPI Threading Building Blocks (oneTBB) 2020.2
- Boost C++ Libraries >= 1.70
- LASTZ 1.04.22
- faToTwoBit (from UCSC Genome Browser source)
<a name="usage"></a> Usage
<a name="alignment"></a> Alignment
Running a Sample Alignment
# install kegalign
git clone https://github.com/galaxyproject/KegAlign.git
cd KegAlign
./scripts/make-conda-env.bash
source ./conda-env.bash
# convert target (ref) and query to 2bit
mkdir work
faToTwoBit <(gzip -cdfq ./test-data/apple.fasta.gz) work/ref.2bit
faToTwoBit <(gzip -cdfq ./test-data/orange.fasta.gz) work/query.2bit
# generate LASTZ keg
python ./scripts/runner.py --diagonal-partition --format maf- --num-cpu 16 --num-gpu 1 --output-file data_package.tgz --output-type tarball --tool_directory ./scripts test-data/apple.fasta.gz test-data/orange.fasta.gz
python ./scripts/package_output.py --format_selector maf --tool_directory ./scripts
# run LASTZ keg
python ./scripts/run_lastz_tarball.py --input=data_package.tgz --output=apple_orange.maf --parallel=16
# check output
diff apple_orange.maf <(gzip -cdfq ./test-data/apple_orange.maf.gz)
# command-line kegalign
kegalign test-data/apple.fasta.gz test-data/orange.fasta.gz work/ --num_gpu 1 --num_threads 16 > lastz-commands.txt
bash lastz-commands.txt
(echo "##maf version=1"; cat *.maf-) > apple_orange.maf
Running with MIG/MPS
GPU utilization can be increased by using MIG and/or MPS, leading up to 20% faster alignments.
- Preparing inputs
With the provided split_input.py script we assign individual chromosomes from the input genome into separate fasta files (up to --max_chunks), each with roughly --goal_bp number of base pairs, which will then be run in parallel on the same GPU(s). Since individual chromosomes are not split, the --goal_bp parameter should not be significantly smaller than the largest chromosome in the input file to ensure similar sized chunks. A good --goal_bp size for the human genome is 200 million base pairs.
mkdir query_split target_split
./scripts/mps-mig/split_input.py --input ./test-data/apple.fasta.gz --out query_split --to_2bit --goal_bp 20000000 --max_chunks 30
./scripts/mps-mig/split_input.py --input ./test-data/orange.fasta.gz --out target_split --to_2bit --goal_bp 20000000 --max_chunks 30
mkdir tmp
- Select GPU UUIDs to run on using
nvidia-smi -L
- run on two GPUs with 4 MPS processes per GPU (replace [GPU-UUID#] with outputs from above command)
Each KegAlign instance, with default settings, uses around 12 to 16 GiB of GPU memory. The chosen GPUs or MIG instances should each have enough GPU memory to run the number of KegAlign instances defined by the --MPS parameter.
python ./scripts/mps-mig/run_mig.py [GPU-UUID1],[GPU-UUID2] --MPS 4 --target ./target_split --query ./query_split --tmp_dir ./tmp/ --mps_pipe_dir ./tmp/ --output ./apples_oranges.maf --num_threads 64
<a name="scoring"></a>Scoring Options
By default the HOXD70 substitution scores are used (from Chiaromonte et al. 2002)
bad_score = X:-1000 # used for sub['X'][*] and sub[*]['X']
fill_score = -100 # used when sub[*][*] is not defined
gap_open_penalty = 400
gap_extend_penalty = 30
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
Matrix can be supplied as an input to --scoring parameter. Substitution matrix can be inferred from your data using another LASTZ-based tool (LASTZ_D: Infer substitution scores).
<a name="output"></a>Output Options
The default output is a MAF alignment file. Other formats can be selected with the --format parameter. See LASTZ manual for description of possible formats.
<a name="cite_kegalign"></a> Citing KegAlign
B Gulhan, R Burhans, R Harris, M Kandemir, M Haeussler, A Nekrutenko. KegAlign: Optimizing pairwise alignments with diagonal partitioning. BIORXIV, 2024. doi: 10.1101/2024.09.02.610839