Home

Awesome

<p align="center"> <a href="https://dl.circleci.com/status-badge/redirect/gh/facebookresearch/baspacho/tree/main"> <img src="https://dl.circleci.com/status-badge/img/gh/facebookresearch/baspacho/tree/main.svg?style=svg&circle-token=a463b80560b6b9db75f7810a171f3c65e8dcad7b" alt="CircleCI" height="20"> </a> <a href="https://github.com/facebookresearch/baspacho/blob/main/LICENSE"> <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License" height="20"> </a> <a href="https://isocpp.org/"> <img src="https://img.shields.io/badge/C++-17-blue.svg" alt="C++" height="20"> </a> <a href="https://github.com/pre-commit/pre-commit"> <img src="https://img.shields.io/badge/pre--commit-enabled-green?logo=pre-commit&logoColor=white" alt="pre-commit" height="20"> </a> <a href="https://github.com/facebookresearch/baspacho/blob/main/CONTRIBUTING.md"> <img src="https://img.shields.io/badge/PRs-welcome-green.svg" alt="PRs" height="20"> </a> </p>

BaSpaCho

BaSpaCho (Batched Sparse Cholesky) is a state of the art direct solver for symmetric positive-definite sparse matrices. It uses supernodal Cholesky decomposition to achieve state of the art performance, turning portions of the sparse matrix into dense blocks and invoking high-performance BLAS/lapack libraries. It is designed with optimization libraries for Levenberg-Marquardt in mind, and aims at reducing part of the complexity offering the best tool for the job. Compared to the library currently considered state of the art (CHOLMOD from SuiteSparse) it supports:

Requirements:

Libraries fetched automatical by build:

Optional libraries:

Configure and Install

Configuring with system blas (eg OpenBLAS):

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc

Configuring with MKL:

. /opt/intel/oneapi/setvars.sh
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DBLA_VENDOR=Intel10_64lp

Compiling and testing:

cmake --build build -- -j16 && ctest --test-dir build

Benchmarking all configurations (using CHOLMOD as baseline):

build/baspacho/benchmarking/bench -B 1_CHOLMOD

Benchmarking on a problem from Bundle Adjustment in the Large

build/baspacho/benchmarking/BAL_bench -i ~/BAL/problem-871-527480-pre.txt

Show tests:

ctest --test-dir build --show-only

Cuda

Cuda is enabled by default with BASPACHO_USE_CUBLAS option (on by default), add -DBASPACHO_USE_CUBLAS=0 to disable in build. May have to add -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc to allow build to find the cuda compiler. The Cuda architectures can be specified with e.g. -DBASPACHO_CUDA_ARCHS="60;70;75", which also supports the options 'detect' (default) which detects the installed GPU arch, and 'torch' which fills in the architectures supported by PyTorch and >=60 (see below).

Blas

The library used is specified in the CMake variable BLA_VENDOR, a few possibilities are:

For the full list check CMake docs at: https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors

Reordering algorithm Approximate Minimum Degree (AMD)

BaSpaCho can use either the implementation in Eigen of AMD (by default), or the version in library AMD as part of SuiteSparse. Add -DBASPACHO_USE_SUITESPARSE_AMD=1 to the build step to use the implementation in SuiteSparse instead of Eigen.

Benchmarking and fine-tuning

BaSpaCho uses models of the timings of BLAS operations in order to decide on the best node merges selected by the supernodal algorithm. The trade-offs might change dramatically depending on the architecture, the number of cores, the BLAS implementation used and CPU vs GPU. Timings required to fit a model can be collected with the bench tool when provided the -Z command line argument. Here is a description of the benchmarking tools provided.

bench

The tool build/baspacho/benchmarking/bench can be used to compare the speed of different configurations on a varied set of synthetic problems. For instance if CHOLMOD is installed you can use it a reference sparse solver to compare to, and the following compares the most time-consuming operation factor using CHOLMOD as baseline:

build/baspacho/benchmarking/bench -B 1_CHOLMOD

(other operations are analysis and solve-X where X can be any number of right-hand sides). Running with -h gives a view of all operations, solvers and problems available (and arguments accepting regular expressions in order to test a subselection of solvers/problems). When running default factor operation with -Z statistics are saved to files of the form <config>_<op>.csv where the <config> looks like stats_cpu_f8 and discriminates the configuration, while <op> is one of the underlying blas-like operations used in factorization (ie one of potrf, trsm, syge, asmbl, where syge stands for syrk/gemm for people familiar with BLAS). The operations generated with a specific configuration (eg cpu_f64) can be processed as

build/baspacho/examples/opt_comp_model -p stats_cpu_f64_potrf.csv -a stats_cpu_f64_asmbl.csv 
                                       -t stats_cpu_f64_trsm.csv -g stats_cpu_f64_syge.csv

on order to fit a computation model that can be provided as a setting to createSolver in order to fine-tune for your architecture.

BAL_bench

It's possible to benchmark on a bundle-adjustment problem from Bundle Adjustment in the Large as

build/baspacho/benchmarking/BAL_bench -i ~/BAL/problem-871-527480-pre.txt

ie. the solver is tested on a linear problem which is identical to the Again, if installed, CHOLMOD is tested as a baseline, but only on the 'reduced' camera-camera problem. BaSpaCho is tested on both the point elimination and the reduced problem.

Examples

A few examples are provided to illustrate what the library is capable of.

Possible future improvements:

Caveats