Home

Awesome

Getl

About

Groovy ETL (Getl) - open source project on Groovy, developed since 2012 to automate loading and processing data from different sources.

Links

When do you need Getl?

Supported RDBMS

IBM DB2, FireBird, H2 Database, Hadoop Hive, Cloudera Impala, MS SQLServer, MySql, IBM Netezza, NetSuite, Oracle, PostgreSql, SAP Hana, SQLite, Micro Focus Vertica.

Supported file sources

CSV, MS Excel, Json, XML, Yaml, DBF.

Supported cloud sources

Kafka, SalesForce, WebServices.

Supported file systems

Local file systems Windows and Linux, FTP, SFTP, Hadoop HDFS.

Operations with RDBMS data sources

Operations with file sources

Working with cloud sources

Working with file systems

Getl language operators

Examples

Registration of connections to Oracle and Vertica:

package packet1

import groovy.transform.BaseScript
import groovy.transform.Field
import getl.lang.Getl

@BaseScript Getl main

oracleConnection('ora', true) {
    connectHost = 'oracle-host'
    connectDatabase = 'oradb'
    login = 'user'
    password = 'password'
}

verticaConnection('ver', true) {
    connectHost = 'vertica-host1'
    connectDatabase = 'verdb'
    extended.backupservernode = 'vertica-host2,vertica-host3'
    login = 'user'
    password = 'password'
}

Creating a table in Vertica based on Oracle table:

oracleTable('ora:table1', true) {
    useConnection oracleConnection('ora')
    schemaName = 'user'
    tableName = 'table1'
}

verticaTable('ver.stage:table1', true) {
    useConnection verticaConnection('ver')
    schemaName = 'stage'
    tableName = 'table1'
    if (!exists) {
        setOracleFields oracleTable('ora:table1')
        create()
    }
}

verticaTable('ver.work:table1', true) {
    useConnection verticaConnection('ver')
    schemaName = 'public'
    tableName = 'table1'
    if (!exists)
        createLike verticaTable('ver.stage:table1')
}

Copying all rows from the Oracle table to the Vertica table by uploading to an temporary csv file and loading it through the COPY statement:

etl.copyRows(oracleTable('ora:table1'), verticaTable('ver.stage:table1')) { bulkLoad = true }

Copying table data from a staging area into a working one:

sql {
    useConnection verticaConnection('ver')
    exec '''
        /*:count_insert*/
        INSERT INTO public.table1 SELECT * FROM stage.table1;
        IF ({count_insert} > 0);
            ECHO Copied {count_insert} rows successfull.
        END IF;
        COMMIT;
    ''' 
}

Truncate staging table and purge working table:

verticaTable('ver.stage:table1') { truncate() }
verticaTable('ver.prod:table1') { purgeTable() }

Unloading rows from the Vertica table to a csv file according to a specified condition:

csv('file1') {
    useConnection csvConnection { path = '/tmp/unload' }
    fileName = 'data.table1'
    extenstion = 'csv'
    fieldDelimiter = '|'
    codePage = 'utf-8'
    header = true
}

verticaTable('ver.work:table1') {
    readOpts {
        where = 'field1 = CURRENT_DATE'
        order = ['field2', 'field3']
    }
}

etl.copyRows(verticaTable('ver.work:table1'), csv('file1'))

Copying a file via ssh to another server:

files('unload_files', true) {
    rootPath = '/tmp/unload'
}

sftp('ssh1', true) {
    host = 'ssh-host'
    login = 'user'
    password = 'password'
    rootPath = '/csv'
}

fileman.copier(files('unload_files'), sftp('ssh1')) {
    useSourcePath { mask = 'data.{table}.csv' }
}