Home

Awesome

ZetaSQL Toolkit

The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines.

This toolkit offers built-in support for:

It also includes tooling to understand column-level lineage in analyzed queries.

Quickstart for BigQuery

When analyzing queries using BigQuery semantics, you need to:

  1. Create a BigQueryCatalog and add resources to it. The BigQueryCatalog supports tables, views, functions, table-valued functions and procedures. Connections and models coming soon.
  2. Configure the ZetaSQL AnalyzerOptions using the BigQuery feature set.
  3. Use ZetaSQLToolkit.analyzeStatements() to perform analysis.

BigQuery dependency

<dependency>
  <groupId>com.google.zetasql.toolkit</groupId>
  <artifactId>zetasql-toolkit-bigquery</artifactId>
  <version>0.5.1</version>
</dependency>

Example

String query =
    "INSERT INTO `bigquery-public-data.samples.wikipedia` (title) VALUES ('random title');\n"
    + "SELECT title, language FROM `bigquery-public-data.samples.wikipedia` WHERE title = 'random title';";

// Create a BigQueryCatalog
// By default, it will use the BigQuery API with application-default credentials
// to fetch BigQuery resources.
BigQueryCatalog catalog = new BigQueryCatalog(/*bqProjectId=*/"bigquery-public-data");

// Add resources to the catalog
// After a resource is added, it will be available when ZetaSQL perform analysis
catalog.addTable("bigquery-public-data.samples.wikipedia");

// Configure the analyzer options using the BigQuery feature set
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(BigQueryLanguageOptions.get());

// Use the ZetaSQLToolkitAnalyzer to run the analyzer
// It results an iterator over the resulting AnalyzedStatements
ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> statementIterator = analyzer.analyzeStatements(query, catalog);

// Use the resulting AnalyzedStatements
statementIterator.forEachRemaining(analyzedStatement -> {
    analyzedStatement.getResolvedStatement().ifPresent(System.out::println);
});

Output

InsertStmt
+-table_scan=
| +-TableScan(table=bigquery-public-data.samples.wikipedia, ...)
+-insert_column_list=[bigquery-public-data.samples.wikipedia.title#1]
+-row_list=
| +-InsertRow
|   +-value_list=
|     +-DMLValue
|       +-value=
|         +-Literal(type=STRING, value=string_value: "random title")
+-column_access_list=...

QueryStmt
+-output_column_list=
| +-bigquery-public-data.samples.wikipedia.title#1 AS `title` [STRING]
| +-bigquery-public-data.samples.wikipedia.language#3 AS `language` [STRING]
+-query=
  +-ProjectScan
    +-column_list=bigquery-public-data.samples.wikipedia.[title#1, language#3]
    +-input_scan=
      +-FilterScan
        +-column_list=...
        +-input_scan=
        | +-TableScan(table=bigquery-public-data.samples.wikipedia, ...)
        +-filter_expr=
          +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL)
            +-ColumnRef(type=STRING, column=bigquery-public-data.samples.wikipedia.title#1)
            +-Literal(type=STRING, value=string_value: "random title")

Quickstart for Cloud Spanner

Similarly, when analyzing queries using Spanner semantics, you need to:

  1. Create a SpannerCatalog and add resources to it. The SpannerCatalog supports tables and views.
  2. Configure the ZetaSQL AnalyzerOptions using the Spanner feature set.
  3. Use ZetaSQLToolkit.analyzeStatements() to perform analysis

Spanner dependency

<dependency>
  <groupId>com.google.zetasql.toolkit</groupId>
  <artifactId>zetasql-toolkit-spanner</artifactId>
  <version>0.5.1</version>
</dependency>

Example

String query = "UPDATE MyTable SET column2 = 5 WHERE column1 = ''; SELECT * FROM MyTable;";

// Configure your Cloud Spanner project, instance and database
String spannerProjectId = "projectId";
String spannerInstanceName = "instanceName";
String spannerDatabaseName = "databaseName";

// Create your SpannerCatalog
// By default, it will use the Spanner database client with application-default 
// credentials to fetch resources.
SpannerCatalog catalog = new SpannerCatalog(
    spannerProjectId, spannerInstanceName, spannerDatabaseName
);

// Add your tables to the catalog
// After a resource is added, it will be available when ZetaSQL perform analysis
catalog.addAllTablesInDatabase();

// Configure the analyzer options
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(SpannerLanguageOptions.get());

// Use the ZetaSQLToolkitAnalyzer to run the analyzer
// It results an iterator over the resulting AnalyzedStatements
ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> statementIterator = analyzer.analyzeStatements(query, catalog);

// Use the resulting AnalyzedStatements
statementIterator.forEachRemaining(analyzedStatement -> {
    analyzedStatement.getResolvedStatement().ifPresent(System.out::println);
});

Output

UpdateStmt
+-table_scan=
| +-TableScan(table=MyTable, column_list=MyTable.[column1#1, column2#2])
+-column_access_list=READ,WRITE
+-where_expr=
| +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL)
|   +-ColumnRef(type=STRING, column=MyTable.column1#1)
|   +-Literal(type=STRING, value=string_value: "")
+-update_item_list=
  +-UpdateItem
    +-target=
    | +-ColumnRef(type=INT64, column=MyTable.column2#2)
    +-set_value=
      +-DMLValue
        +-value=
          +-Literal(type=INT64, value=int64_value: 5)

QueryStmt
+-output_column_list=
| +-MyTable.column1#1 AS `column1` [STRING]
| +-MyTable.column2#2 AS `column2` [INT64]
+-query=
  +-ProjectScan
    +-column_list=MyTable.[column1#1, column2#2]
    +-input_scan=
      +-TableScan(table=MyTable, column_list=MyTable.[column1#1, column2#2])

Examples

See a list of comprehensive usage examples here.

Disclaimer

This is not an officially supported Google product.