Home

Awesome

Query data stored in Accumulo tables directly with HiveQL.

Pertains to issue: https://issues.apache.org/jira/browse/ACCUMULO-143

Currently does not work with Hadoop 2.0/CDH4.

<a href="https://github.com/bfemiano/accumulo-hive-storage-manager/wiki/Basic-Tutorial">Getting Started Guide</a>

<a href="http://storage-handler-docs.s3.amazonaws.com/javadocs/index.html">Javadocs</a>

<a href="https://github.com/bfemiano/accumulo-hive-storage-manager/wiki/Iterator Predicate pushdown">How Iterator Predicate pushdown works</a>

<a href="https://github.com/bfemiano/accumulo-hive-storage-manager/wiki/Required-Aux-Jars">List of required AUX_JARS</a>

ACLED examples:

$ACCUMULO_HOME/bin, $HADOOP_HOME/bin, $HIVE_HOME/bin on environment path. Either wget or curl installed.

The query examples use a cleaned up version of the structured Acled Nigeria dataset. (http://www.acleddata.com/)

  1. Navigate to src/test/hql/acled and run ingest.sh. The script handles creating and loading data for both the Hive and Accumulo acled tables named 'acled_nigeria' and 'acled' respectively. The ETL and data for both processes runs standalone from the ingest directory.

  2. See query_acled.sql for CREATE EXTERNAL TABLE example, required aux jars, and several sample queries that utilize both the Hive and Accumulo tables. The number of hive columns in table definition must be equal to accumulo.column.mapping.

  3. Run query_acled.sh to see the different query results. Make sure to configure the -hiveconf variables for your local Accumulo instance.

Known limitations:

Future enhancements:

Usage

Licensed AS-IS under Apache License 2.0