Awesome
DuckDB Athena Extension
WARNING This is a work in progress - things may or may not work as expected π§ββοΈ
Limitations
- Only the
default
database is supported - Not all data types are implemented yet
- 10,000 results are returned by default (use
maxrows=-1
to return everything) - Pushdown predicates are not supported
Getting started
The Athena extension is supported in DuckDB v0.7.0 and up. To install the extension, start duckdb with the unsigned
parameter.
> duckdb -unsigned
v0.7.1 b00b93f0b1
D
The first time you use the extension, you need to install it from a custom repository.
SET custom_extension_repository='d2j9pg7mqm9we6.cloudfront.net/athena/latest';
INSTALL athena;
Then LOAD the extension. You only need to run the INSTALL command once.
LOAD athena;
You can now extract data from tables in your default data catalog.
select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix");
Warning To prevent runaway queries, the extension only returns 10,000 rows by default. If you'd like to return everything, you can add
maxrows=-1
as a parameter inside the function.
select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix", maxrows=-1);
Filter pushdown is not yet supported so the extension will scan the entire table.
Note The extension uses your environment variables to figure out region and credentials. Make sure to have your access key/secret set.
Development
- Clone the repo with submodules
git clone https://github.com/dacort/duckdb-athena-extension.git --recurse-submodules
- Build
cd duckdb-athena-extension
make release
- Start up duckdb with the
-unsigned
parameter and your desired AWS_REGION
AWS_REGION=us-east-1 build/debug/duckdb -unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D
- Load the extension
load 'build/debug/extension/duckdb-athena-extension/athena.duckdb_extension';
- Query a single table, also providing where S3 results are written to
select * from athena_scan('table_name', 's3://<bucket>/athena-results/);
Warning: 10,000 results will be returned by default! Use
maxrows=-1
to return the entire table.
D select * from athena_scan("amazon_reviews_parquet");
Running Athena query, execution id: 152a20c7-ff32-4a19-bb71-ae0135373ca6
State: Queued, sleep 5 secs ...
Total execution time: 1307 millis
100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ¬ββββββββββββββ¬βββββββββββββββββ¬βββββββββββββ¬βββββββββββββββββ¬ββββ¬ββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββ
β marketplace β customer_id β review_id β product_id β product_parent β β¦ β vine β verified_purchase β review_headline β review_body β review_date β year β
β varchar β varchar β varchar β varchar β varchar β β varchar β varchar β varchar β varchar β int64 β int32 β
βββββββββββββββΌββββββββββββββΌβββββββββββββββββΌβββββββββββββΌβββββββββββββββββΌββββΌββββββββββΌββββββββββββββββββββΌβββββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββΌββββββββ€
β US β 37441986 β R2H287L0BUP89U β B00CT780C2 β 473048287 β β¦ β N β Y β Perfect Gift β I love giving my sβ¦ β 140454171422720 β 0 β
β US β 20676035 β R1222MJHP5QWXE β B004LLILFA β 361255549 β β¦ β N β Y β Five Stars β Great gift for outβ¦ β 16170 β 2014 β
β US β 45090731 β R32ECJRNTB61K8 β B004LLIL4G β 307223063 β β¦ β N β Y β happy birthday card β gift cards from Amβ¦ β 140454171423232 β 0 β
β US β 2207141 β RLTEU3JZ1IJAA β B004LLILDM β 87389551 β β¦ β N β Y β Five Stars β gracias. β 16391 β 2014 β
β US β 15258 β R1ZAX1TN66QOU6 β B004LLIKVU β 473048287 β β¦ β N β Y β easy breezy β gift card was sentβ¦ β 140454171424000 β 0 β
β Β· β Β· β Β· β Β· β Β· β Β· β Β· β Β· β Β· β Β· β Β· β Β· β
βββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββ΄βββββββββββββ΄βββββββββββββββββ΄ββββ΄ββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββ€
β 999 rows (40 shown) 15 columns (11 shown) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Credits
- Initial rust DuckDB Extension Framework: https://github.com/Mause/duckdb-extension-framework
- Updated rust extension framework: https://github.com/eto-ai/lance/tree/main/integration/duckdb_lance