Home

Awesome

DuckDB Excel Extension

This extension adds support for the TEXT function for formatting numbers from Microsoft Excel.

Example usage:

SELECT text(1234567.897, '$#.##') AS result;
┌────────────┐
│   result   │
│  varchar   │
├────────────┤
│ $1234567.9 │
└────────────┘

Documentation

See the Excel page in the DuckDB documentation.

XLSX Files

Reading XLSX Files

.xlsx files can be read using the read_xlsx function. The following named parameters are supported.

Options:

OptionTypeDefaultDescription
headerBOOLEANautomatically inferredWhether to treat the first row as containing the names of the resulting columns
sheetVARCHARautomatically inferredThe name of the sheet in the xlsx file to read. Default is the first sheet.
all_varcharBOOLEANfalseWhether to read all cells as containing VARCHARs.
ignore_errorsBOOLEANfalseWhether to ignore errors and silently replace cells that cant be cast to the corresponding inferred column type with NULL's.
rangeVARCHARautomatically inferredThe range of cells to read. For example, A1:B2 reads the cells from A1 to B2. If not specified the resulting range will be inferred as rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns
stop_at_emptyBOOLEANfalse/trueWhether to stop reading the file when an empty row is encountered. If an explicit range option is provided, this is false by default, otherwise true
empty_as_varcharBOOLEANfalseWhether to treat empty cells as VARCHAR instead of DOUBLE when trying to automatically infer column types

Example usage:

SELECT * FROM read_xlsx('test.xlsx', header 'true');
----
┌────────┬────────┐
│   a    │   b    │
│ double │ double │
├────────┼────────┤
│    1.0 │    2.0 │
│    3.0 │    4.0 │
└────────┴────────┘

-- Alternatively, we can use a xlsx file as a "replacement scans" and select from it immediately
-- but without being able to pass options.

SELECT * FROM 'test.xlsx';
----
┌────────┬────────┐
│   a    │   b    │
│ double │ double │
├────────┼────────┤
│    1.0 │    2.0 │
│    3.0 │    4.0 │
└────────┴────────┘

Writing XLSX Files

Writing .xlsx files is supported using the COPY statement with XLSX given as the format. The following additional parameters are supported.

Options:

OptionTypeDefaultDescription
headerBOOLEANfalseWhether to write the column names as the first row in the sheet
sheetVARCHARSheet1The name of the sheet in the xlsx file to write.
sheet_row_limitINTEGER1048576The maximum number of rows in a sheet. An error is thrown if this limit is exceeded.

Example usage:

CREATE TABLE test AS SELECT * FROM (VALUES (1, 2), (3, 4)) AS t(a, b);
COPY test TO 'test.xlsx' (format 'xlsx', header 'true');

Type Conversions and Inference

Because XLSX files only really support storing strings and numbers, the equivalent of VARCHAR and DOUBLE, the following type conversions are applied when writing XLSX files.

When reading XLSX files, almost everything is read as either DOUBLE or VARCHAR depending on the Excel cell type. However, there are some caveats.

If the all_varchar option is set to true, none of the above applies and all cells are read as VARCHAR.

When no types are specified explicitly, (e.g. when using the read_xlsx function instead of COPY TO ... FROM '<file>.xlsx') the types of the resulting columns are inferred based on the first "data" row in the sheet, that is:

This can sometimes lead to issues if the first "data row" is not representative of the rest of the sheet (e.g. it contains empty cells) in which case the ignore_errors or empty_as_varchar options can be used to work around this. Alternatively, when the COPY TO ... FROM '<file>.xlsx' syntax is used, no type inference is done and the types of the resulting columns are determined by the types of the columns in the table being copied to. All cells will simply be converted by casting from DOUBLE or VARCHAR to the target column type.