Awesome
<img src="static/logo.svg" alt="Velox logo" width="50%" align="center" />Velox is a composable execution engine distributed as an open source C++ library. It provides reusable, extensible, and high-performance data processing components that can be (re-)used to build data management systems focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Velox was created by Meta and it is currently developed in partnership with IBM/Ahana, Intel, Voltron Data, Microsoft, ByteDance and many other companies.
In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines.
Velox provides the following high-level components:
- Type: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, etc.
- Vector: an Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.
- Expression Eval: a fully vectorized expression evaluation engine that allows expressions to be efficiently executed on top of Vector/Arrow encoded data.
- Functions: sets of vectorized scalar, aggregates, and window functions implementations following the Presto and Spark semantic.
- Operators: implementation of relational operators such as scans, writes, projections, filtering, grouping, ordering, shuffle/exchange, hash, merge, and nested loop joins, unnest, and more.
- I/O: a connector interface for extensible data sources and sinks, supporting different file formats (ORC/DWRF, Parquet, Nimble), and storage adapters (S3, HDFS, GCS, ABFS, local files) to be used.
- Network Serializers: an interface where different wire protocols can be implemented, used for network communication, supporting PrestoPage and Spark's UnsafeRow.
- Resource Management: a collection of primitives for handling computational resources, such as memory arenas and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching.
Velox is extensible and allows developers to define their own engine-specific specializations, including:
- Custom types
- Simple and vectorized functions
- Aggregate functions
- Window functions
- Operators
- File formats
- Storage adapters
- Network serializers
Examples
Examples of extensibility and integration with different component APIs can be found here
Documentation
Developer guides detailing many aspects of the library, in addition to the list of available functions can be found here.
Blog posts are available here.
Community
Velox is an open source project supported by a community of individual contributors and organizations. The project's technical governance mechanics is described in this document..
Project maintainers are listed here.
The main communication channel with the Velox OSS community is through the the Velox-OSS Slack workspace, github Issues, and Discussions.
For access to the Velox Slack workspace, please add a comment to this Discussion
Contributing
Check our contributing guide to learn about how to contribute to the project.
License
Velox is licensed under the Apache 2.0 License. A copy of the license can be found here.
Getting Started
Get the Velox Source
git clone https://github.com/facebookincubator/velox.git
cd velox
Once Velox is checked out, the first step is to install the dependencies. Details on the dependencies and how Velox manages some of them for you can be found here.
Velox also provides the following scripts to help developers setup and install Velox dependencies for a given platform.
Setting up dependencies
The following setup scripts use the DEPENDENCY_DIR
environment variable to set the
location to download and build packages. This defaults to deps-download
in the current
working directory.
Use INSTALL_PREFIX
to set the install directory of the packages. This defaults to
deps-install
in the current working directory on macOS and to the default install
location (eg. /usr/local
) on linux.
Using the default install location /usr/local
on macOS is discouraged since this
location is used by certain Homebrew versions.
Manually add the INSTALL_PREFIX
value in the IDE or bash environment,
say export INSTALL_PREFIX=/Users/$USERNAME/velox/deps-install
to ~/.zshrc
so that
subsequent Velox builds can use the installed packages.
You can reuse DEPENDENCY_INSTALL
and INSTALL_PREFIX
for Velox clients such as Prestissimo
by specifying a common shared directory.`
Setting up on macOS
On a macOS machine (either Intel or Apple silicon) you can setup and then build like so:
$ ./scripts/setup-macos.sh
$ make
With macOS 14.4 and XCode 15.3 where m4
is missing, you can either
- install
m4
viabrew
:
$ brew install m4
$ export PATH=/opt/homebrew/opt/m4/bin:$PATH
- or use
gm4
instead:
$ M4=/usr/bin/gm4 make
Setting up on Ubuntu (20.04 or later)
The supported architectures are x86_64 (avx, sse), and AArch64 (apple-m1+crc, neoverse-n1). You can build like so:
$ ./scripts/setup-ubuntu.sh
$ make
Setting up on Centos 9 Stream with adapters
Velox adapters include file-systems such as AWS S3, Google Cloud Storage, and Azure Blob File System. These adapters require installation of additional libraries. Once you have checked out Velox, you can setup and build like so:
$ ./scripts/setup-centos9.sh
$ ./scripts/setup-adapters.sh
$ make
Note that setup-adapters.sh
supports macOS and Ubuntu 20.04 or later.
Using Clang on Linux
Clang 15 can be additionally installed during the setup step for Ubuntu 22.04/24.04
and CentOS 9 by setting the USE_CLANG
environment variable prior to running the platform specific setup script.
$ export USE_CLANG=true
This will install and use Clang 15 to build the dependencies instead of using the default GCC compiler.
Once completed, and before running any make
command, set the compiler to be used:
$ export CC=/usr/bin/clang-15
$ export CXX=/usr/bin/clang++-15
$ make
Building Velox
Run make
in the root directory to compile the sources. For development, use
make debug
to build a non-optimized debug version, or make release
to build
an optimized version. Use make unittest
to build and run tests.
Note that,
- Velox requires a compiler at the minimum GCC 11.0 or Clang 15.0.
- Velox requires the CPU to support instruction sets:
- bmi
- bmi2
- f16c
- Velox tries to use the following (or equivalent) instruction sets where available:
- On Intel CPUs
- avx
- avx2
- sse
- On ARM
- Neon
- Neon64
- On Intel CPUs
Build metrics for Velox are published at https://facebookincubator.github.io/velox/bm-report/
Building Velox with docker-compose
If you don't want to install the system dependencies required to build Velox, you can also build and run tests for Velox on a docker container using docker-compose. Use the following commands:
$ docker-compose build ubuntu-cpp
$ docker-compose run --rm ubuntu-cpp
If you want to increase or decrease the number of threads used when building Velox
you can override the NUM_THREADS
environment variable by doing:
$ docker-compose run -e NUM_THREADS=<NUM_THREADS_TO_USE> --rm ubuntu-cpp