Home

Awesome

NAME

Thrift::API::HiveClient2 - Perl to HiveServer2 Thrift API wrapper

VERSION

version 0.024

METHODS

new

Initialize the client object with the Hive server parameters

my $client = Thrift::API::HiveClient2->new(
    host    => $hive_host,
    port    => $hive_port,
    timeout => $seconds,
);

host

Host name or IP, defaults to localhost.

port

Hive port, defaults to 10000.

principal

Kerberos principal. Default is not set. See the "WARNING" section.

sasl

Enables authentication. Default is not set. See the "WARNING" section.

timeout

Seconds timeout, defaults to 1 hour.

connect

Open the connection on the server declared in the object's constructor.

 $client->connect() or die "Failed to connect";

execute

Run an HiveQl statement on an open connection.

my $rh = $client->execute( <HiveQL statement> );

fetch

Returns an array(ref) of arrayrefs, like DBI's fetchall_arrayref, and a boolean indicator telling wether or not a subsequent call to fetch() will return more rows.

my ($rv, $has_more_rows) = $client->fetch( $rh, <maximum records to retrieve> );

IMPORTANT: The version of HiveServer2 that we use for testing is the one bundled with CDH 4.2.1. The hasMoreRows method is currently broken, and always returns false. So the right way of obtaining the resultset is to keep using fetch() until it returns an empty array. For this reason the behaviour of fetch has been altered in scalar context (which becomes the current advised way of retrieving the data):

# $rv will be an arrayref is anything was fetched, and undef otherwise.
#
while (my $rv = $client->fetch( $rh, <maximum records to retrieve> )) {
    # ... do something with @$rv
}

This is the approach adopted in https://github.com/cloudera/hue/blob/master/apps/beeswax/src/beeswax/server/hive_server2_lib.py

Starting with version 0.12, we cache the operation handle and don't need it as a first parameter for the fetch() call. We want to be backward-compatible though, so depending on the type of the first parameter, we'll ignore it (since we cached it in the object and we can get it from there) or we'll use it as the number of rows to be retrieved if it looks like a positive integer:

 my $rv = $client->fetch( 10_000 );

fetch_hashref

Same use as above, but result is returned as an arrayref of hashes (which keys are the column names)

get_columns

Get the columns description for a table, returned in an array of hashrefs which keys are named after the result of an ODBC GetColumns call. "default" is used for the schema name is none is specified as 2nd argument. The hashes keys documentation can be found on https://msdn.microsoft.com/en-us/library/ms711683(v=vs.85).aspx for instance.

my $columns = $client->get_columns('<table name>'[, '<schema name>']);

get_tables

Get a list of tables. Optional table name pattern as a first argument (use undef or '%' to get all tables while defining a schema as a second argument), and optional schema second arg (default is "default")

my $tables = $client->get_tables(['<table pattern, SQL wildcards accepted>', ['<schema name>']]);

Returns an arrayref of hashes:

[...
{
    'REMARKS' => 'test comment', # table comment
    'TABLE_NAME' => 'foo_bar',   # table name
    'TABLE_SCHEM' => 'default',  # schema ("database")
    'TABLE_TYPE' => 'TABLE',     # TABLE, VIEW, etc
    'TABLE_CAT' => '',           # catalog (unused?)
}];

WARNING

Thrift in Perl originally did not support SASL, so authentication needed to be disabled on HiveServer2 by setting this property in your /etc/hive/conf/hive-site.xml. Although the property is documented, this *value* -which disables the SASL server transport- is not, AFAICT.

<property>
  <name>hive.server2.authentication</name>
  <value>NOSASL</value>
</property>

Starting with 0.014, support for secure clusters has been added thanks to Thrift::SASL::Transport. This behaviour is set by passing sasl => 1 to the constructor. It has been tested with hive.server2.authentication = KERBEROS. It of course requires a valid credentials cache (kinit) or keytab. With this, kerberos principal also should be provided as part of constructor, principal => hive/_HOST@REALM.COM this value will be under hive.server2.authentication.kerberos.principal in hive-site.xml

Starting with 0.015, other authentication methods are supported, and driven by the content of the sasl property. When built using sasl => 0 or sasl => 1, the behaviour is unchanged. When passed a hashref of arguments that follow the Authen::SASL syntax for object creation, it is passed directly to Authen::SASL, for instance:

{
  mechanism  => 'PLAIN',
  callback   => {
    canonuser => $USER, # not 'user', as I thought reading Authen::SASL's doc
    password  => "foobar",
  }
}

Note that a server configured with NONE will happily accept the PLAIN method.

DELEGATIONTOKEN

Sasl object need to be created specifically if hiveClient2 is used with delegation token. { mechanism => 'DIGEST-MD5', callback => { canonuser => <bas65 encoded identifier extracted from delegation token>, password => <bas65 encoded password extracted from delegation token>, realm => 'default' } } This is used when hiveclient is called from oozie, where keytabs cannot be used. Oozie requests delegationtoken on behalf of hive if specified. This token is used for further authentication purposes.

CAVEATS

The instance of hiveserver2 we have didn't return results encoded in UTF8, for the reason mentioned here: https://groups.google.com/a/cloudera.org/d/msg/cdh-user/AXeEuaFP0Ro/Txmn1OHleAsJ

So we had to change the init script for hive-server2 to make it behave, adding '-Dfile.encoding=UTF-8' to HADOOP_OPTS

REPOSITORY

https://github.com/dmorel/Thrift-API-HiveClient2

CONTRIBUTORS

Burak Gürsoy (BURAK)

Neil Bowers (NEILB)

AUTHOR

David Morel david.morel@amakuru.net

COPYRIGHT AND LICENSE

This software is Copyright (c) 2015 by David Morel & Booking.com. Portions are (c) R.Scaffidi, Thrift files are (c) Apache Software Foundation.

This is free software, licensed under:

The Apache License, Version 2.0, January 2004