Home

Awesome

<a name="top"></a>

Let your DuckDB quack SAP!

Disclaimer: This extension is currently in an experimental state. Feel free to try it out, but be aware that minimal testing and benchmarking were done.

With ERPL, use Enterprise Data in your Data Science and ML pipelines within minutes!

Quick Install

Install with two lines of code in DuckDB (unsigned option must be set):

INSTALL 'erpl' FROM 'http://get.erpl.io';
LOAD 'erpl';

Quicklinks


★ What is the ERPL Extension?

The primary objective of this DuckDB extension is to facilitate seamless integration with the SAP data ecosystem. Our approach prioritizes:

We focus predominantly on two main use cases:

Please be aware that DataZoo GmbH is the independent developer of this extension and does not hold any affiliation with DuckDB Labs or the DuckDB Foundation. "DuckDB" is a trademark registered by the DuckDB Foundation.

Our development journey is underway, with a functioning prototype available that facilitates:

Transparency is our ethos, and in line with this, we are planning a commercial trajectory for the extension, structured as follows:

For inquiries, potential collaborations, or if your curiosity is piqued, connect with us at https://erpl.io.

Back to Top

âš™ Example Usage

A first example demonstrates how to join two SAP tables with an external table. We’ll be using the ABAP Flight Reference Scenario, specifically joining the SFLIGHT and SPFLI tables which contain flight and flight schedule details respectively, with an external table WEATHER that holds weather information. We will extract flight information and associated temperatures at departure and arrival cities.

To start with, we assume you have setup DuckDB and already installed the ERPL extension (see below) and have access to an SAP system having the ABAP Flight Reference Scenario data.

First we start DuckDB with the -unsigned - flag set. Then we load the extension and configure the necessary credentials to connect to the SAP system:

LOAD `erpl`;

SET sap_ashost = 'localhost';
SET sap_sysnr = '00';
SET sap_user = 'DEVELOPER';
SET sap_password = 'ABAPtr1909';
SET sap_client = '001';
SET sap_lang = 'EN';

In our case we use the docker based ABAP Platform Trial. The credentials are set by default, details can be found in the documentation of the docker image.

Now we explore the schema of the three tables tables:

<details> <summary> SELECT * FROM SAP_DESCRIBE_FIELDS('SFLIGHT'); </summary>
posis_keyfieldtextsap_typelengthdecimalscheck_tableref_tableref_fieldlanguage
0001XMANDTClientCLNT000003000000T000E
0002XCARRIDAirline CodeCHAR000003000000SCARRE
0003XCONNIDFlight Connection NumberNUMC000004000000SPFLIE
0004XFLDATEFlight dateDATS000008000000E
0005PRICEAirfareCURR000015000002SFLIGHTCURRENCYE
0006CURRENCYLocal currency of airlineCUKY000005000000SCURXE
0007PLANETYPEAircraft TypeCHAR000010000000SAPLANEE
0008SEATSMAXMaximum Capacity in Economy ClassINT4000010000000E
0009SEATSOCCOccupied Seats in Economy ClassINT4000010000000E
0010PAYMENTSUMTotal of current bookingsCURR000017000002SFLIGHTCURRENCYE
0011SEATSMAX_BMaximum Capacity in Business ClassINT4000010000000E
0012SEATSOCC_BOccupied Seats in Business ClassINT4000010000000E
0013SEATSMAX_FMaximum Capacity in First ClassINT4000010000000E
0014SEATSOCC_FOccupied Seats in First ClassINT4000010000000E
</details> <details> <summary> SELECT * FROM SAP_DESCRIBE_FIELDS('SPFLI'); </summary>
posis_keyfieldtextsap_typelengthdecimalscheck_tableref_tableref_fieldlanguage
0001XMANDTClientCLNT000003000000T000E
0002XCARRIDAirline CodeCHAR000003000000SCARRE
0003XCONNIDFlight Connection NumberNUMC000004000000E
0004COUNTRYFRCountry KeyCHAR000003000000SGEOCITYE
0005CITYFROMDeparture cityCHAR000020000000SGEOCITYE
0006AIRPFROMDeparture airportCHAR000003000000SAIRPORTE
0007COUNTRYTOCountry KeyCHAR000003000000SGEOCITYE
0008CITYTOArrival cityCHAR000020000000SGEOCITYE
0009AIRPTODestination airportCHAR000003000000SAIRPORTE
0010FLTIMEFlight timeINT4000010000000E
0011DEPTIMEDeparture timeTIMS000006000000E
0012ARRTIMEArrival timeTIMS000006000000E
0013DISTANCEDistanceQUAN000009000004SPFLIDISTIDE
0014DISTIDMass unit of distance (kms, miles)UNIT000003000000E
0015FLTYPEFlight typeCHAR000001000000E
0016PERIODArrival n day(s) laterINT1000003000000E
</details> <details> <summary> DESCRIBE SELECT * FROM 'WEATHER.csv'; </summary>
column_namecolumn_typenullkeydefaultextra
FLDATEDATEYES
COUNTRYVARCHARYES
CITYVARCHARYES
TEMPERATUREDOUBLEYES
CONDITIONVARCHARYES
</details>

The actual example joins this three tables:

SELECT 
  f.CARRID,
  f.CONNID,
  f.FLDATE,
  s.CITYFROM as CITY_FROM,
  ROUND(w_from.TEMPERATURE, 1) as TEMP_FROM,
  s.CITYTO as CITY_TO,
  ROUND(w_to.TEMPERATURE, 1) as TEMP_TO,
  FROM sap_read_table('SFLIGHT') AS f
  JOIN sap_read_table('SPFLI') AS s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
  JOIN "WEATHER.csv" AS w_from
      ON (f.FLDATE = w_from.FLDATE AND s.COUNTRYFR = w_from.COUNTRY AND s.CITYFROM = w_from.CITY)
  JOIN "WEATHER.csv" AS w_to
      ON (f.FLDATE = w_to.FLDATE AND s.COUNTRYTO = w_to.COUNTRY AND s.CITYTO = w_to.CITY)
  ORDER BY 1, 2, 3
  LIMIT 25

This SQL query performs the following operations:

The output of this query will provide a comprehensive view of the flights, including their departure and arrival cities, and the corresponding temperatures, thus offering valuable insights for flight operations analysis.

| CARRID   |   CONNID | FLDATE              | CITY_FROM   |   TEMP_FROM | CITY_TO       |   TEMP_TO |
|:---------|---------:|:--------------------|:------------|------------:|:--------------|----------:|
| AA       |     0017 | 2016-11-15 00:00:00 | NEW YORK    |        28.3 | SAN FRANCISCO |      19.8 |
| AA       |     0017 | 2017-02-03 00:00:00 | NEW YORK    |        18.9 | SAN FRANCISCO |      17.2 |
| AA       |     0017 | 2017-04-24 00:00:00 | NEW YORK    |        14.7 | SAN FRANCISCO |      16.2 |
| AA       |     0017 | 2017-07-13 00:00:00 | NEW YORK    |        16.8 | SAN FRANCISCO |      22.5 |
| AA       |     0017 | 2017-10-01 00:00:00 | NEW YORK    |        14.6 | SAN FRANCISCO |      28.3 |
| AA       |     0017 | 2017-12-20 00:00:00 | NEW YORK    |        13   | SAN FRANCISCO |      21.4 |
| AZ       |     0555 | 2016-11-15 00:00:00 | ROME        |        20.6 | FRANKFURT     |      24.2 |
| AZ       |     0555 | 2017-02-03 00:00:00 | ROME        |        13.1 | FRANKFURT     |      24.1 |
| AZ       |     0555 | 2017-04-24 00:00:00 | ROME        |        20.7 | FRANKFURT     |      24.6 |
| AZ       |     0555 | 2017-07-13 00:00:00 | ROME        |        14   | FRANKFURT     |      16.3 |
| AZ       |     0555 | 2017-10-01 00:00:00 | ROME        |        20.9 | FRANKFURT     |      15.3 |
| AZ       |     0555 | 2017-12-20 00:00:00 | ROME        |        23.8 | FRANKFURT     |      24.7 |
| AZ       |     0789 | 2016-11-15 00:00:00 | TOKYO       |        29.7 | ROME          |      20.6 |
| AZ       |     0789 | 2017-02-03 00:00:00 | TOKYO       |        23.4 | ROME          |      13.1 |
| AZ       |     0789 | 2017-04-24 00:00:00 | TOKYO       |        28.5 | ROME          |      20.7 |
| AZ       |     0789 | 2017-07-13 00:00:00 | TOKYO       |        19.4 | ROME          |      14   |
| AZ       |     0789 | 2017-10-01 00:00:00 | TOKYO       |        19.8 | ROME          |      20.9 |
| AZ       |     0789 | 2017-12-20 00:00:00 | TOKYO       |        15.6 | ROME          |      23.8 |
| DL       |     0106 | 2016-11-13 00:00:00 | NEW YORK    |         9.1 | FRANKFURT     |      13.8 |
| DL       |     0106 | 2017-02-01 00:00:00 | NEW YORK    |        24.3 | FRANKFURT     |      23.9 |
| DL       |     0106 | 2017-04-22 00:00:00 | NEW YORK    |        15.8 | FRANKFURT     |      23.5 |
| DL       |     0106 | 2017-07-11 00:00:00 | NEW YORK    |        20.7 | FRANKFURT     |      30.6 |
| DL       |     0106 | 2017-09-29 00:00:00 | NEW YORK    |        16.3 | FRANKFURT     |      19.1 |
| DL       |     0106 | 2017-12-18 00:00:00 | NEW YORK    |        14.4 | FRANKFURT     |      21.8 |
| JL       |     0407 | 2016-11-17 00:00:00 | TOKYO       |        20.3 | FRANKFURT     |      20.4 |

This example can also be found in its Python, R, or NODEJS version in the examples folder.

Back to Top

➜ Obtaining the ERPL Extension

Introduction

Building extensions for DuckDB can be challenging due to the varying C++ compiler and library ecosystem. This variability often leads to incompatibilities between locally built extensions and the centrally distributed DuckDB binary, primarily due to differences in the Application Binary Interface (ABI).

Recommended Approach

To ensure compatibility and ease of use, we follow a build process similar to that of the DuckDB team. Our advice is to start with the pre-compiled binaries available in our GitHub releases. For those interested in building the extension themselves, our development instructions provide detailed guidance.

Binary Selection

The assets in each release follow this naming convention:

erpl-${DUCKDB_VERSION}-extension-{OS}-{ARCH}.tar.gz

Choose the binary that matches your usage scenario. The table below summarizes the available binaries for various platforms and use cases:

UsageOperating SystemArchitecture
DuckDB CLILinuxamd64_gcc4
PythonLinuxamd64
RLinuxamd64
JuliaLinuxamd64
Node.jsLinuxamd64
DuckDB CLIWindowsamd64
PythonWindowsamd64
RWindowsamd64
JuliaWindowsamd64
Node.jsWindowsamd64
DuckDB CLIOSXamd64
PythonOSXamd64
ROSXamd64
JuliaOSXamd64
Node.jsOSXamd64
DuckDB CLIOSXarm64
PythonOSXarm64
ROSXarm64
JuliaOSXarm64
Node.jsOSXarm64

Back to Top

💻 Installing the ERPL Binaries

Introduction

Installation of the ERPL extension is straightforward. Please note that this extension is independent of the DuckDB Foundation and DuckDB Labs, meaning the binaries are unsigned. Consequently, DuckDB must be initiated with the -unsigned flag. Detailed instructions on this process can be found in the DuckDB documentation.

Installation Steps

  1. Enable Unsigned Extensions in DuckDB: Set the -unsigned flag as described in the DuckDB documentation.
  2. Install and Load the ERPL Extension:
    INSTALL 'path/to/erpl.duckdb_extension';
    LOAD 'erpl';
    

Confirmation of Successful Installation

Upon successful installation and loading, the extension will output the following message:

-- Loading ERPL Trampoline Extension. --
(Saves ERPL SAP dependencies to '/home/jr/.duckdb/extensions/v0.9.2/linux_amd64' and loads them)
ERPL extension saved and loaded from /home/jr/.duckdb/extensions/v0.9.2/linux_amd64/erpl_impl.duckdb_extension.
For usage instructions, visit https://erpl.io

Understanding the Extension Loading Process

The ERPL extension is composed of two parts:

  1. Trampoline Extension: Extracts SAP Netweaver RFC SDK and SAP Business Warehouse BICS libraries from the binary.
  2. Implementation Extension: The actual functional part of the extension.

The erpl_init function in the trampoline extension bundles and extracts dependencies into the DuckDB extension folder. Post-installation, the directory ~/.duckdb/extensions/v0.9.2/linux_amd64 should contain the following files:

-rw-r--r-- 1 jr jr 110M 26. Nov 10:23 erpl.duckdb_extension
-rw-r--r-- 1 jr jr  34M 26. Nov 10:35 erpl_impl.duckdb_extension
-rw-r--r-- 1 jr jr  20M 26. Nov 10:35 libicudata.so.50
-rw-r--r-- 1 jr jr  12M 26. Nov 10:35 libicui18n.so.50
-rw-r--r-- 1 jr jr 8,4M 26. Nov 10:35 libicuuc.so.50
-rw-r--r-- 1 jr jr 9,5M 26. Nov 10:35 libsapnwrfc.so
-rw-r--r-- 1 jr jr 1,1M 26. Nov 10:35 libsapucum.so

This revised section aims for a clearer, more structured presentation of the installation process, ensuring users can easily understand and follow the steps.

Back to Top

Tracking

Overview

Our extension automatically collects basic usage data to enhance its performance and gain insights into user engagement. We employ Posthog for data analysis, transmitting information securely to the European Posthog instance at https://eu.posthog.com via HTTPS.

Data Collected

Each transmitted request includes the following information:

Event Tracking

Data is transmitted under these circumstances:

User Configuration Options

Users can control tracking through these settings:

  1. Enable/Disable Tracking:

    SET erpl_telemetry_enabled = TRUE; -- Enabled by default; set to FALSE to disable tracking
    
  2. Posthog API Key Configuration (usually unchanged):

    SET erpl_telemetry_key = 'phc_XXX'; -- Pre-set to our key; customizable to your own key
    

This approach ensures transparency about data collection while offering users control over their privacy settings.

Back to Top

License

The ERPL extension is licensed under the Business Source License (BSL) Version 1.1. The BSL is a source-available license that gives you the following permissions:

Allowed:

  1. Copy, Modify, and Create Derivative Works: You have the right to copy the software, modify it, and create derivative works based on it.
  2. Redistribute and Non-Production Use: Redistribution and non-production use of the software is permitted.
  3. Limited Production Use: You can make production use of the software, but with limitations. Specifically, the software cannot be offered to third parties on a hosted or embedded basis.
  4. Change License Rights: After the Change Date (five years from the first publication of the Licensed Work), you gain rights under the terms of the Change License (MPL 2.0). This implies broader permissions after this date.

Not Allowed:

  1. Offering to Third Parties on Hosted/Embedded Basis: The Additional Use Grant restricts using the software in a manner that it is offered to third parties on a hosted or embedded basis.
  2. Violation of Current License Requirements: If your use does not comply with the requirements of the BSL, you must either purchase a commercial license or refrain from using the software.
  3. Trademark Usage: You don't have rights to any trademark or logo of Licensor or its affiliates, except as expressly required by the License.

Additional Points:

This summary is based on the provided license text and should be used as a guideline. For legal advice or clarification on specific points, consulting a legal professional is recommended, especially for commercial or complex use cases.

Back to Top