Home

Awesome

<p align="center"> <img height="75" src="img/logo.png" alt="csv2"/> </p>

Table of Contents

CSV Reader

#include <csv2/reader.hpp>

int main() {
  csv2::Reader<csv2::delimiter<','>, 
               csv2::quote_character<'"'>, 
               csv2::first_row_is_header<true>,
               csv2::trim_policy::trim_whitespace> csv;
               
  if (csv.mmap("foo.csv")) {
    const auto header = csv.header();
    for (const auto row: csv) {
      for (const auto cell: row) {
        // Do something with cell value
        // std::string value;
        // cell.read_value(value);
      }
    }
  }
}

Performance Benchmark

This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for csv2 to memory-map the input CSV file and iterate over every cell in the CSV. See benchmark/main.cpp for more details.

cd benchmark
g++ -I../include -O3 -std=c++11 -o main main.cpp
./main <csv_file>

System Details

TypeValue
Processor11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Installed RAM32.0 GB (31.9 GB usable)
SSDADATA SX8200PNP
OSUbuntu 20.04 LTS running on WSL in Windows 11
C++ Compilerg++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0

Results (as of 23 SEP 2022)

DatasetFile SizeRowsColsTime
Denver Crime Data111 MB479,100190.102s
AirBnb Paris Listings196 MB141,730960.170s
2015 Flight Delays and Cancellations574 MB5,819,079310.603s
StackLite: Stack Overflow questions870 MB17,203,82470.911s
Used Cars Dataset1.4 GB539,768250.947s
Title-Based Semantic Subject Indexing3.7 GB12,834,02642.867s
Bitcoin tweets - 16M tweets4 GB47,478,74893.290s
DDoS Balanced Dataset6.3 GB12,794,627856.963s
Seattle Checkouts by Title7.1 GB34,892,623117.698s
SHA-1 password hash dump11 GB2,62,974,241210.775s
DOHUI NOH scaled_data16 GB496,782321316.553s

Reader API

Here is the public API available to you:

template <class delimiter = delimiter<','>, 
          class quote_character = quote_character<'"'>,
          class first_row_is_header = first_row_is_header<true>,
          class trim_policy = trim_policy::trim_whitespace>
class Reader {
public:
  
  // Use this if you'd like to mmap and read from file
  bool mmap(string_type filename);

  // Use this if you have the CSV contents in std::string already
  bool parse(string_type contents);

  // Shape
  size_t rows() const;
  size_t cols() const;
  
  // Row iterator
  // If first_row_is_header, row iteration will start
  // from the second row
  RowIterator begin() const;
  RowIterator end() const;

  // Access the first row of the CSV
  Row header() const;
};

Here's the Row class:

// Row class
class Row {
public:
  // Get raw contents of the row
  void read_raw_value(Container& value) const;
  
  // Cell iterator
  CellIterator begin() const;
  CellIterator end() const;
};

and here's the Cell class:

// Cell class
class Cell {
public:
  // Get raw contents of the cell
  void read_raw_value(Container& value) const;
  
  // Get converted contents of the cell
  // Handles escaped content, e.g., 
  // """foo""" => ""foo""
  void read_value(Container& value) const;
};

CSV Writer

This library also provides a basic csv2::Writer class - one that can be used to write CSV rows to file. Here's a basic usage:

#include <csv2/writer.hpp>
#include <vector>
#include <string>
using namespace csv2;

int main() {
    std::ofstream stream("foo.csv");
    Writer<delimiter<','>> writer(stream);

    std::vector<std::vector<std::string>> rows = 
        {
            {"a", "b", "c"},
            {"1", "2", "3"},
            {"4", "5", "6"}
        };

    writer.write_rows(rows);
    stream.close();
}

Writer API

Here is the public API available to you:

template <class delimiter = delimiter<','>>
class Writer {
public:
  
  // Construct using an std::ofstream
  Writer(output_file_stream stream);

  // Use this to write a single row to file
  void write_row(container_of_strings row);

  // Use this to write a list of rows to file
  void write_rows(container_of_rows rows);

Compiling Tests

mkdir build && cd build
cmake -DCSV2_BUILD_TESTS=ON ..
make
cd test
./csv2_test

Generating Single Header

python3 utils/amalgamate/amalgamate.py -c single_include.json -s .

Contributing

Contributions are welcome, have a look at the CONTRIBUTING.md document for more information.

License

The project is available under the MIT license.