Home

Awesome

zsv+lib: the world's fastest (simd) CSV parser, with an extensible CLI

lib + CLI: ci GitHub pre-release) GitHub release (latest by date) GitHub all releases (downloads) License

lib only: NPM Version NPM Install Size

zsv+lib is a fast CSV parser library and extensible command-line utility. It achieves high performance using SIMD operations, efficient memory use and other optimization techniques, and can also parse generic-delimited and fixed-width formats, as well as multi-row-span headers

The ZSV CLI can be compiled to virtually any target, including WebAssembly, and offers features including select, count, direct CSV sql, flatten, serialize, 2json conversion, 2db sqlite3 conversion, stack, pretty, 2tsv, compare, paste and more.

Pre-built CLI packages are available via brew and nuget.

A pre-built library package is available for Node (npm install zsv-lib). Please note, this package is still in alpha and currently only exposes a small subset of the zsv library capabilities. More to come.

If you like zsv+lib, do not forget to give it a star! 🌟

Performance

Preliminary performance results compare favorably vs other CSV utilities (xsv, tsv-utils, csvkit, mlr (miller) etc). Below were results on a pre-M1 macOS MBA; on most platforms zsvlib was 2x faster, though in some cases the advantage was smaller e.g. 15-25%) (below, mlr not shown as it was about 25x slower):

<img src="https://user-images.githubusercontent.com/26302468/146497899-48174114-3b18-49b0-97da-35754ab56e48.png" alt="count speed" height="150px"><img src="https://user-images.githubusercontent.com/26302468/146498211-afc77ce6-4229-4599-bf33-81bf00c725a8.png" alt="select speed" height="150px">

** See 12/19 update re M1 processor at https://github.com/liquidaty/zsv/blob/main/app/benchmark/README.md

Which "CSV"

"CSV" is an ambiguous term. This library uses the same definition as Excel. In addition, it provides a row-level (as well as cell-level) API and provides "normalized" CSV output (e.g. input of this"iscell1,"thisis,"cell2 becomes "this""iscell1","thisis,cell2"). Each of these three objectives (Excel compatibility, row-level API and normalized output) has a measurable performance impact; conversely, it is possible to achieve-- which a number of other CSV parsers do-- much faster parsing speeds if any of these requirements (especially Excel compatibility) are dropped.

Built-in and extensible features

zsv is an extensible CSV utility, which uses zsvlib, for tasks such as slicing and dicing, querying with SQL, combining, serializing, flattening, converting between CSV/JSON/sqlite3 and more.

zsv is streamlined for easy development of custom dynamic extensions.

zsvlib and zsv are written in C, but since zsvlib is a library, and zsv extensions are just shared libraries, you can extend zsv with your own code in any programming language, so long as it has been compiled into a shared library that implements the expected interface.

Key highlights

Installing

Packages

Download pre-built binaries and packages for macOS, Windows, Linux and BSD from the Releases page.

You can also download pre-built binaries and packages from Actions for the latest commits and PRs but these are retained only for limited days.

macOS

...via Homebrew:

brew tap liquidaty/zsv
brew install zsv

...via MacPorts:

sudo port install zsv

Linux

For Linux (Debian/Ubuntu - *.deb):

# Install
sudo apt install ./zsv-amd64-linux-gcc.deb

# Uninstall
sudo apt remove zsv

For Linux (RHEL/CentOS - *.rpm):

# Install
sudo yum install ./zsv-amd64-linux-gcc.rpm

# Uninstall
sudo yum remove zsv

Windows

For Windows (*.nupkg), install with nuget.exe:

# Install via nuget custom feed (requires absolutes paths)
md nuget-feed
nuget.exe add zsv .\<path>\zsv-amd64-windows-mingw.nupkg -source <path>/nuget-feed
nuget.exe install zsv -version <version> -source <path>/nuget-feed

# Uninstall
nuget.exe delete zsv <version> -source <path>/nuget-feed

For Windows (*.nupkg), install with choco.exe:

# Install
choco.exe install zsv --pre -source <directory containing .nupkg file>

# Uninstall
choco.exe uninstall zsv

Node

The zsv parser library is available for node:

npm install zsv-lib

Please note:

From source

See BUILD.md for more details.

Why another CSV parser/utility?

Our objectives, which we were unable to find in a pre-existing project, are:

There are several excellent tools that achieve high performance. Among those we considered were xsv and tsv-utils. While they met our performance objective, both were designed primarily as a utility and not a library, and were not easy enough, for our needs, to customize and/or to support modular customizations that could be maintained (or licensed) independently of the related project (in addition to the fact that they were written in Rust and D, respectively, which happen to be languages with which we lacked deep experience, especially for web assembly targeting).

Others we considered were Miller (mlr), csvkit and Go (csv module), which did not meet our performance objective. We also considered various other libraries using SIMD for CSV parsing, but none that we tried met the "real-world CSV" objective.

Hence, zsv was created as a library and a versatile application, both optimized for speed and ease of development for extending and/or customizing to your needs.

Batteries included

zsv comes with several built-in commands:

Each of these can also be built as an independent executable named zsv_xxx where xxx is the command name.

Running the CLI

After installing, run zsv help to see usage details. The typical syntax is zsv <command> <parameters> e.g.

zsv sql my_population_data.csv "select * from data where population > 100000"

Using the API

Simple API usage examples include:

Pull parsing:

zsv_parser parser = zsv_new(...);
while(zsv_next_row(parser) == zsv_status_row) { // for each row
  // ...
  size_t cell_count = zsv_cell_count(parser);
  for(size_t i = 0; i < cell_count; i++) { // for each cell
    struct zsv_cell c = zsv_get_cell(parser, i);
    fprintf(stderr, "Cell: %.*s\n", c.len, c.str);
    // ...
  }
}

Push parsing:

static void my_row_handler(void *ctx) {
  zsv_parser p = ctx;
  size_t cell_count = zsv_cell_count(p);
  for(size_t i = 0, j = zsv_cell_count(p); i < j; i++) {
    // ...
  }
}

int main() {
  zsv_parser p = zsv_new(NULL);
  zsv_set_row_handler(p, my_row_handler);
  zsv_set_context(p, p);
  while(zsv_parse_more(data.parser) == zsv_status_ok);
  return 0;
}

Full application code examples can be found at examples/lib/README.md.

An example of using the API, compiled to wasm and called via Javascript, is in examples/js/README.md.

For more sophisticated (but at this time, only sporadically commented/documented) use cases, see the various CLI C source files in the app directory such as app/serialize.c.

Creating your own extension

You can extend zsv by providing a pre-compiled shared or static library that defines the functions specified in extension_template.h and which zsv loads in one of three ways:

Example and template

You can build and run a sample extension by running make test from app/ext_example.

The easiest way to implement your own extension is to copy and customize the template files in app/ext_template

Current release limitations

This release does not yet implement the full range of core features that are planned for implementation prior to beta release. If you are interested in helping, please post an issue.

Possible enhancements and related developments

Contribute

License

MIT