Home

Awesome

<a name="top"></a>

ERPL Web

ERPL Web is a DuckDB extension that allows you to make HTTP requests from within DuckDB SQL queries.

This is an experimental extension for DuckDB that adds basic support for making HTTP requests from within SQL queries. The extension is built on top of the httplib, which is already included in the DuckDB source code.

➜ Installation

The installation process is the same as for any other DuckDB extension. As we provide a thrid party extension, you have to start DuckDB with the -unsigned flag to load the extension. After that the following commands can be used to install and load the extension:

INSTALL 'erpl_web' FROM 'http://get.erpl.io';
LOAD 'erpl_web'; 

Back to Top

⚙ Example Usage

To run a HTTP GET request, use the http_get function. For example:

SELECT content FROM http_get('http://httpbun.com/ip');

You get back a table valued result with the following fields:

We have now a few examples, showing how combining the http_get with DuckDBs powerful JSON functions can be used to extract data from APIs.

A first request

SELECT method, status, content_type FROM http_get('https://httpbun.com/get');
┌─────────┬────────┬──────────────────┐
│ method  │ status │   content_type   │
│ varchar │ int32  │     varchar      │
├─────────┼────────┼──────────────────┤
│ GET     │    200 │ application/json │
└─────────┴────────┴──────────────────┘

Extracting JSON data from response

When it comes now the more interesting example of extracting content from an API, we use httpbin service. It returns typically the follwoing resusults

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 416
Content-Type: application/json
Date: Sat, 08 Jun 2024 14:32:32 GMT
Server: gunicorn/19.9.0

{
    "args": {
        "foo": "bar"
    },
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate, br",
        "Host": "httpbin.org",
        "User-Agent": "HTTPie/3.2.2",
        "X-Amzn-Trace-Id": "Root=1-66646b80-6fea073c77b8279a6183c2e9"
    },
    "json": null,
    "method": "GET",
    "origin": "95.90.195.65",
    "url": "https://httpbin.org/anything?foo=bar"
}

The following query issues a GET request to the httpbin service and extracts the args.foo property, url and method from the JSON response. The details of the JSON extraction can be found in the DuckDB documentation.

SELECT status, 
	   content::JSON->'args'->>'foo' as args,
	   content::JSON->>'url' as url,
	   content::JSON->>'method' as method
FROM http_get('https://httpbin.org/anything?foo=bar');

This results into the following output:

┌────────┬─────────┬──────────────────────────────────────┬─────────┐
│ status │  args   │                 url                  │ method  │
│ int32  │ varchar │               varchar                │ varchar │
├────────┼─────────┼──────────────────────────────────────┼─────────┤
│    200 │ bar     │ https://httpbin.org/anything?foo=bar │ GET     │
└────────┴─────────┴──────────────────────────────────────┴─────────┘

Back to Top

💻 HTTP Function Reference

For simple HTTP interaction the extension currently provides the following two functions for making HTTP requests, which just retrieve the content of the response:

FunctionDescription
http_get(url:VARCHAR)Do an Http GET request, and retrieve the data
http_head(url:VARCHAR)Do an HTTP HEAD request.

We further provide the following functions, supporting mutating HTTP verbs:

FunctionDescription
http_post(url:VARCHAR, body::JSON)Do an HTTP POST request. The request body is of DuckDB's JSON type.
http_post(url:VARCHAR, body::VARCHAR, content_type::VARCHAR)Do an HTTP POST request. The function takes two arguments: The request body as well as the content_type.
http_put(url:VARCHAR, body::JSON)Do an HTTP PUT request. The request body is of DuckDB's JSON type.
http_put(url:VARCHAR, body::VARCHAR, content_type::VARCHAR)Do an HTTP PUT request. The function takes two arguments: The request body as well as the content_type.
http_patch(url:VARCHAR, body::JSON)Do an HTTP PATCH request. The request body is of DuckDB's JSON type.
http_patch(url:VARCHAR, body::VARCHAR, content_type::VARCHAR)Do an HTTP PATCH request. The function takes two arguments: The request body as well as the content_type.
http_delete(url:VARCHAR, body::JSON)Do an HTTP DELETE request. The request body is of DuckDB's JSON type.
http_delete(url:VARCHAR, body::VARCHAR, content_type::VARCHAR)Do an HTTP DELETE request. The function takes two arguments: The request body as well as the content_type.

This functions take the following named parameters parameters:

NameDescriptionTypeDefault
headersA DuckDB map containing key-value-pairs of HTTP headers.MAP(VARCHAR, VARCHAR){}
content_typeThe value of the HTTP content type header (in case of a mutating query).BOOLfalse
acceptThe value of the HTTP accept header.VARCHAR'application/json'
authThe authentication value which is put in the header. For auth_type == 'BASIC' pass in a username:password pair. For other auth types enter the respective tokenVARCHARNULL
auth_typeAn enum of different HTTP auth types with the possible values {'BASIC', 'DIGEST', 'BEARER'}.ENUMBASIC
timeoutThe request timeout in milliseconds.BIGINT60000

Back to Top

Tracking

Overview

Our extension automatically collects basic usage data to enhance its performance and gain insights into user engagement. We employ Posthog for data analysis, transmitting information securely to the European Posthog instance at https://eu.posthog.com via HTTPS.

Data Collected

Each transmitted request includes the following information:

Event Tracking

Data is transmitted under these circumstances:

User Configuration Options

Users can control tracking through these settings:

  1. Enable/Disable Tracking:

    SET erpl_telemetry_enabled = TRUE; -- Enabled by default; set to FALSE to disable tracking
    
  2. Posthog API Key Configuration (usually unchanged):

    SET erpl_telemetry_key = 'phc_XXX'; -- Pre-set to our key; customizable to your own key
    

This approach ensures transparency about data collection while offering users control over their privacy settings.

Back to Top

License

The ERPL extension is licensed under the Business Source License (BSL) Version 1.1. The BSL is a source-available license that gives you the following permissions:

Allowed:

  1. Copy, Modify, and Create Derivative Works: You have the right to copy the software, modify it, and create derivative works based on it.
  2. Redistribute and Non-Production Use: Redistribution and non-production use of the software is permitted.
  3. Limited Production Use: You can make production use of the software, but with limitations. Specifically, the software cannot be offered to third parties on a hosted or embedded basis.
  4. Change License Rights: After the Change Date (five years from the first publication of the Licensed Work), you gain rights under the terms of the Change License (MPL 2.0). This implies broader permissions after this date.

Not Allowed:

  1. Offering to Third Parties on Hosted/Embedded Basis: The Additional Use Grant restricts using the software in a manner that it is offered to third parties on a hosted or embedded basis.
  2. Violation of Current License Requirements: If your use does not comply with the requirements of the BSL, you must either purchase a commercial license or refrain from using the software.
  3. Trademark Usage: You don't have rights to any trademark or logo of Licensor or its affiliates, except as expressly required by the License.

Additional Points:

This summary is based on the provided license text and should be used as a guideline. For legal advice or clarification on specific points, consulting a legal professional is recommended, especially for commercial or complex use cases.

Back to Top