Home

Awesome

Recipient of the 2023 James H. Wilkinson Prize for Numerical Software

Recipient of the 2020 SIAM Activity Group on Supercomputing Best Paper Prize

The BLIS cat is sleeping.

Build Status Build Status

<img alt="Discord logo" title="Join us on Discord!" height="32px" src="docs/images/discord.svg" />

Contents

Introduction

BLIS is an award-winning portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available.

For a thorough presentation of our framework, please read our ACM Transactions on Mathematical Software (TOMS) journal article, "BLIS: A Framework for Rapidly Instantiating BLAS Functionality". For those who just want an executive summary, please see the Key Features section below.

In a follow-up article (also in ACM TOMS), "The BLIS Framework: Experiments in Portability", we investigate using BLIS to instantiate level-3 BLAS implementations on a variety of general-purpose, low-power, and multicore architectures.

An IPDPS'14 conference paper titled "Anatomy of High-Performance Many-Threaded Matrix Multiplication" systematically explores the opportunities for parallelism within the five loops that BLIS exposes in its matrix multiplication algorithm.

For other papers related to BLIS, please see the Citations section below.

It is our belief that BLIS offers substantial benefits in productivity when compared to conventional approaches to developing BLAS libraries, as well as a much-needed refinement of the BLAS interface, and thus constitutes a major advance in dense linear algebra computation. While BLIS remains a work-in-progress, we are excited to continue its development and further cultivate its use within the community.

The BLIS framework is primarily developed and maintained by individuals in the Science of High-Performance Computing (SHPC) group in the Oden Institute for Computational Engineering and Sciences at The University of Texas at Austin and in the Matthews Research Group at Southern Methodist University. Please visit the SHPC website for more information about our research group, such as a list of people and collaborators, funding sources, publications, and other educational projects (such as MOOCs).

Education and Learning

Want to understand what's under the hood? Many of the same concepts and principles employed when developing BLIS are introduced and taught in a basic pedagogical setting as part of LAFF-On Programming for High Performance (LAFF-On-PfHP), one of several massive open online courses (MOOCs) in the Linear Algebra: Foundations to Frontiers series, all of which are available for free via the edX platform.

What's New

What People Are Saying About BLIS

"I noticed a substantial increase in multithreaded performance on my own machine, which was extremely satisfying." ... "[I was] happy it worked so well!" (Justin Shea)

"This is an awesome library." ... "I want to thank you and the blis team for your efforts." (@Lephar)

"Any time somebody outside Intel beats MKL by a nontrivial amount, I report it to the MKL team. It is fantastic for any open-source project to get within 10% of MKL... [T]his is why Intel funds BLIS development." (@jeffhammond)

"So BLIS is now a part of Elk." ... "We have found that zgemm applied to a 15000x15000 matrix with multi-threaded BLIS on a 32-core Ryzen 2990WX processor is about twice as fast as MKL" ... "I'm starting to like this a lot." (@jdk2016)

"I [found] BLIS because I was looking for BLAS operations on C-ordered arrays for NumPy. BLIS has that, but even better is the fact that it's developed in the open using a more modern language than Fortran." (@nschloe)

"The specific reason to have BLIS included [in Linux distributions] is the KNL and SKX [AVX-512] BLAS support, which OpenBLAS doesn't have." (@loveshack)

"All tests pass without errors on OpenBSD. Thanks!" (@ararslan)

"Thank you very much for your great help!... Looking forward to benchmarking." (@mrader1248)

"Thanks for the beautiful work." (@mmrmo)

"[M]y software currently uses BLIS for its BLAS interface..." (@ShadenSmith)

"[T]hanks so much for your work on this! Excited to test." ... "[On AMD Excavator], BLIS is competitive to / slightly faster than OpenBLAS for dgemms in my tests." (@iotamudelta)

"BLIS provided the only viable option on KNL, whose ecosystem is at present dominated by blackbox toolchains. Thanks again. Keep on this great work." (@heroxbd)

"I want to definitely try this out..." (@ViralBShah)

Key Features

BLIS offers several advantages over traditional BLAS libraries:

How to Download BLIS

There are a few ways to download BLIS. We list the most common four ways below. We highly recommend using either Option 1 or 2. Otherwise, we recommend Option 3 (over Option 4) so your compiler can perform optimizations specific to your hardware.

  1. Download a source repository with git clone. Generally speaking, we prefer using git clone to clone a git repository. Having a repository allows the user to periodically pull in the latest changes, try out release candidates when they become available, switch to older versions easily, and quickly rebuild BLIS whenever they wish. (Note that implicit in cloning a repository is that the repository defaults to using the master branch, which, as of 1.0, is considered akin to a development branch and likely contains improvements since the most recent release.)

    In order to clone a git repository of BLIS, please obtain a repository URL by clicking on the green button above the file/directory listing near the top of this page (as rendered by GitHub). Generally speaking, it will amount to executing the following command in your terminal shell:

    git clone https://github.com/flame/blis.git
    

    At this point, you will have the latest commit of the master branch checked out. If you wish to check out an official release version, say, 1.0, execute the following:

    git checkout 1.0
    

    git will then transform your working copy to match the state of the commit associated with version 1.0. You can view a list of official versiontags at any time by executing:

    git tag --list
    

    Note that pre-release versions, such as release candidates, are actually branches rather than tags, and thus will not show up in the list of tagged versions.

  2. Download a source release via a tarball/zip file. If you would like to stick to the code that is included in official releases and don't need the convenience of pulling in the latest changes via git, you may download either a tarball or zip file of BLIS's latest release. (NOTE: Some older releases are only available as tagged commits. Also note that downloading release x.y.z is equivalent to downloading, or checking out, the git tag x.y.z.) We consider this option to be less than ideal for some people since you will not be able to update your code with a simple git pull command.

  3. Download a source repository via a zip file. If you are uncomfortable with using git but would still like the latest stable commits, we recommend that you download BLIS as a zip file.

    In order to download a zip file of the BLIS source distribution, please click on the green button above the file listing near the top of this page. This should reveal a link for downloading the zip file.

  4. Download a binary package specific to your OS. While we don't recommend this as the first choice for most users, we provide links to community members who generously maintain BLIS packages for various Linux distributions such as Debian Unstable and EPEL/Fedora. Please see the External Packages section below for more information.

Getting Started

NOTE: This section assumes you've either cloned a BLIS source code repository via git, downloaded the latest source code via a zip file, or downloaded the source code for a tagged version release---Options 1, 2, or 3, respectively, as discussed in the previous section.

If you just want to build a sequential (not parallelized) version of BLIS in a hurry and come back and explore other topics later, you can configure and build BLIS as follows:

$ ./configure auto
$ make [-j]

You can then verify your build by running BLAS- and BLIS-specific test drivers via make check:

$ make check [-j]

And if you would like to install BLIS to the directory specified to configure via the --prefix option, run the install target:

$ make install

Please read the output of ./configure --help for a full list of configure-time options. If/when you have time, we strongly encourage you to read the detailed walkthrough of the build system found in our Build System guide.

If you are still having trouble, you are welcome to join us on Discord for further information and/or assistance.

Example Code

The BLIS source distribution provides example code in the examples directory. Example code focuses on using BLIS APIs (not BLAS or CBLAS), and resides in two subdirectories: examples/oapi (which demonstrates the object API) and examples/tapi (which demonstrates the typed API).

Either directory contains several files, each containing various pieces of code that exercise core functionality of the BLIS API in question (object or typed). These example files should be thought of collectively like a tutorial, and therefore it is recommended to start from the beginning (the file that starts in 00).

You can build all of the examples by simply running make from either example subdirectory (examples/oapi or examples/tapi). (You can also run make clean.) The local Makefile assumes that you've already configured and built (but not necessarily installed) BLIS two directories up, in ../... If you have already installed BLIS to some permanent directory, you may refer to that installation by setting the environment variable BLIS_INSTALL_PATH prior to running make:

export BLIS_INSTALL_PATH=/usr/local; make

or by setting the same variable as part of the make command:

make BLIS_INSTALL_PATH=/usr/local

Once the executable files have been built, we recommend reading the code and the corresponding executable output side by side. This will help you see the effects of each section of code.

This tutorial is not exhaustive or complete; several object API functions were omitted (mostly for brevity's sake) and thus more examples could be written.

Documentation

We provide extensive documentation on the BLIS build system, APIs, test infrastructure, and other important topics. All documentation is formatted in markdown and included in the BLIS source distribution (usually in the docs directory). Slightly longer descriptions of each document may be found via in the project's wiki section.

Documents for everyone:

Documents for github contributors:

Documents for BLIS developers:

Performance

We provide graphs that report performance of several implementations across a range of hardware types, multithreading configurations, problem sizes, operations, and datatypes. These pages also document most of the details needed to reproduce these experiments.

External Packages

Generally speaking, we highly recommend building from source whenever possible using the latest git clone. (Tarballs of each tagged release are also available, but we consider them to be less ideal since they are not as easy to upgrade as git clones.)

That said, some users may prefer binary and/or source packages through their Linux distribution. Thanks to generous involvement/contributions from our community members, the following BLIS packages are now available:

Discussion

Most of the active discussions are now happening on our Discord server. Users and developers alike are welcome! Please see the BLIS Discord guide for a walkthrough of how to join us.

You can also still stay in touch by using either of the following mailing lists:

Contributing

For information on how to contribute to our project, including preferred coding conventions, please refer to the CONTRIBUTING file at the top-level of the BLIS source distribution.

Citations

For those of you looking for the appropriate article to cite regarding BLIS, we recommend citing our first ACM TOMS journal paper (unofficial backup link):

@article{BLIS1,
   author      = {Field G. {V}an~{Z}ee and Robert A. {v}an~{d}e~{G}eijn},
   title       = {{BLIS}: A Framework for Rapidly Instantiating {BLAS} Functionality},
   journal     = {ACM Transactions on Mathematical Software},
   volume      = {41},
   number      = {3},
   pages       = {14:1--14:33},
   month       = {June},
   year        = {2015},
   issue_date  = {June 2015},
   url         = {https://doi.acm.org/10.1145/2764454},
}

You may also cite the second ACM TOMS journal paper (unofficial backup link):

@article{BLIS2,
   author      = {Field G. {V}an~{Z}ee and Tyler Smith and Francisco D. Igual and
                  Mikhail Smelyanskiy and Xianyi Zhang and Michael Kistler and Vernon Austel and
                  John Gunnels and Tze Meng Low and Bryan Marker and Lee Killough and
                  Robert A. {v}an~{d}e~{G}eijn},
   title       = {The {BLIS} Framework: Experiments in Portability},
   journal     = {ACM Transactions on Mathematical Software},
   volume      = {42},
   number      = {2},
   pages       = {12:1--12:19},
   month       = {June},
   year        = {2016},
   issue_date  = {June 2016},
   url         = {https://doi.acm.org/10.1145/2755561},
}

We also have a third paper, submitted to IPDPS 2014, on achieving multithreaded parallelism in BLIS (unofficial backup link):

@inproceedings{BLIS3,
   author      = {Tyler M. Smith and Robert A. {v}an~{d}e~{G}eijn and Mikhail Smelyanskiy and
                  Jeff R. Hammond and Field G. {V}an~{Z}ee},
   title       = {Anatomy of High-Performance Many-Threaded Matrix Multiplication},
   booktitle   = {28th IEEE International Parallel \& Distributed Processing Symposium
                  (IPDPS 2014)},
   year        = {2014},
   url         = {https://doi.org/10.1109/IPDPS.2014.110},
}

A fourth paper, submitted to ACM TOMS, also exists, which proposes an analytical model for determining blocksize parameters in BLIS (unofficial backup link):

@article{BLIS4,
   author      = {Tze Meng Low and Francisco D. Igual and Tyler M. Smith and
                  Enrique S. Quintana-Ort\'{\i}},
   title       = {Analytical Modeling Is Enough for High-Performance {BLIS}},
   journal     = {ACM Transactions on Mathematical Software},
   volume      = {43},
   number      = {2},
   pages       = {12:1--12:18},
   month       = {August},
   year        = {2016},
   issue_date  = {August 2016},
   url         = {https://doi.acm.org/10.1145/2925987},
}

A fifth paper, submitted to ACM TOMS, begins the study of so-called induced methods for complex matrix multiplication (unofficial backup link):

@article{BLIS5,
   author      = {Field G. {V}an~{Z}ee and Tyler Smith},
   title       = {Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods},
   journal     = {ACM Transactions on Mathematical Software},
   volume      = {44},
   number      = {1},
   pages       = {7:1--7:36},
   month       = {July},
   year        = {2017},
   issue_date  = {July 2017},
   url         = {https://doi.acm.org/10.1145/3086466},
}

A sixth paper, submitted to ACM TOMS, revisits the topic of the previous article and derives a superior induced method (unofficial backup link):

@article{BLIS6,
   author      = {Field G. {V}an~{Z}ee},
   title       = {Implementing High-Performance Complex Matrix Multiplication via the 1m Method},
   journal     = {SIAM Journal on Scientific Computing},
   volume      = {42},
   number      = {5},
   pages       = {C221--C244},
   month       = {September}
   year        = {2020},
   issue_date  = {September 2020},
   url         = {https://doi.org/10.1137/19M1282040}
}

A seventh paper, submitted to ACM TOMS, explores the implementation of gemm for mixed-domain and/or mixed-precision operands (unofficial backup link):

@article{BLIS7,
   author      = {Field G. {V}an~{Z}ee and Devangi N. Parikh and Robert A. van~de~{G}eijn},
   title       = {Supporting Mixed-domain Mixed-precision Matrix Multiplication
within the BLIS Framework},
   journal     = {ACM Transactions on Mathematical Software},
   volume      = {47},
   number      = {2},
   pages       = {12:1--12:26},
   month       = {April},
   year        = {2021},
   issue_date  = {April 2021},
   url         = {https://doi.org/10.1145/3402225},
}

Awards

Funding

This project and its associated research were partially sponsored by grants from Microsoft, Intel, Texas Instruments, AMD, HPE, Oracle, Huawei, Facebook, and ARM, as well as grants from the National Science Foundation (Awards CCF-0917167, ACI-1148125/1340293, CCF-1320112, and ACI-1550493).

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).