Home

Awesome

License: MIT Build and test

This repository aims not to implenet a full-fledge CSV parser for DuckDB but to demonstrate a minimum table function example for letting you know how to implement a custom scanner for your datasource. This CSV parser implementation relies on a DuckDB table function and, if you would like to see how it works, please see maropu/duckdb_extension_example.

Minimum Required Components

.
|-- CMakeLists.txt                  // Root CMake build file
|-- Makefile                        // Build script to wrap cmake
|-- data                            // Test data used in `test/sql/csv_scanner.test`
|   |-- random.csv
|   `-- test.csv
|-- duckdb                          // DuckDB source code that this extension depends on
|-- extension_config.cmake          // Extension configure file included by DuckDB's build system
|-- makefiles
|   `-- duckdb_extension.Makefile   // common build configuration to compile extention, copied from `duckdb/extension-ci-tools`
|-- src
|   |-- CMakeLists.txt              // CMake build file to list source files
|   |-- csv_scanner_extension.cpp   // CSV parser implmenetation
|   |-- include
|   |   `-- csv_scanner.hpp         // Header file for the CSV parser
|   `-- scan_csv.cpp                // Entrypoint where DuckDB loads this extension
|-- test
|   `-- sql
|       `-- csv_scanner.test        // Test code
`-- vcpkg.json                      // Dependency definition file

Specification of this toy CSV parser

Running this example

$ git submodule init
$ git submodule update
$ DUCKDB_GIT_VERSION=v1.1.3 make set_duckdb_version
$ make
mkdir -p build/release && \
	cmake  -DEXTENSION_STATIC_BUILD=1 -DDUCKDB_EXTENSION_NAMES="csv_scanner" -DDUCKDB_EXTENSION_CSV_SCANNER_PATH="./duckdb/duckdb_scanner_example/" -DDUCKDB_EXTENSION_CSV_SCANNER_SHOULD_LINK=0 -DDUCKDB_EXTENSION_CSV_SCANNER_LOAD_TESTS=1 -DDUCKDB_EXTENSION_CSV_SCANNER_TEST_PATH="./duckdb/duckdb_scanner_example/test/sql" -DOSX_BUILD_ARCH=  -DDUCKDB_EXPLICIT_PLATFORM='' -DCMAKE_BUILD_TYPE=Release -S ./duckdb/ -B build/release && \
	cmake --build build/release --config Release
-- git hash af39bd0dcf, version v1.1.3, extension folder v1.1.3
-- Extensions will be deployed to: ./duckdb/duckdb_scanner_example/build/release/repository
-- Load extension 'csv_scanner' from './duckdb/duckdb_scanner_example/'
-- Load extension 'parquet' from './duckdb/duckdb_scanner_example/duckdb/extensions' @ v1.1.3
-- Extensions linked into DuckDB: [parquet]
-- Extensions built but not linked: [csv_scanner]
-- Tests loaded for extensions: [csv_scanner]
-- Configuring done
-- Generating done
-- Build files have been written to: ./duckdb/duckdb_scanner_example/build/release
[  1%] Built target duckdb_platform_binary
[  1%] Built target duckdb_platform
...
[100%] Linking CXX executable unittest
[100%] Built target unittest
[100%] Built target imdb

$ make test
...
[1/1] (100%): test/sql/csv_scanner.test
===============================================================================
All tests passed (13 assertions in 1 test case)

$ cat data/test.csv
aaa,1,1.23
bbb,2,3.14
ccc,3,2.56

$ duckdb -unsigned
v1.1.3 19864453f7
Enter ".help" for usage hints.
D LOAD './build/release/extension/csv_scanner/csv_scanner.duckdb_extension';
D SELECT * FROM scan_csv_ex('data/test.csv', {'a': 'varchar', 'b': 'bigint', 'c': 'double'});
┌─────────┬───────┬────────┐
│    a    │   b   │   c    │
│ varchar │ int64 │ double │
├─────────┼───────┼────────┤
│ aaa     │     1 │   1.23 │
│ bbb     │     2 │   3.14 │
│ ccc     │     3 │   2.56 │
└─────────┴───────┴────────┘

External Resources

Any Question?

If you have any question, please feel free to leave it on Issues or Twitter (@maropu).