Home

Awesome

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

Apache Accumulo Examples

Build Status

Introduction

The Accumulo-Examples repository contains a collection of examples for Accumulo versions 2.0 and greater. Examples within the main branch are designed to work with the version currently under development. Additional branches exist for previous releases of the Accumulo 2.x line. For example, the 2.0 branch contains examples specifically intended to work with that release version.

The Accumulo Tour also provides several simple introductory examples that may be of interest.

A collection of examples for Accumulo 1.10 can be found here.

Setup instructions

Follow the steps below to run the Accumulo examples:

  1. Clone this repository

     git clone https://github.com/apache/accumulo-examples.git
    
  2. Follow Accumulo's quickstart to install and run an Accumulo instance. Accumulo has an accumulo-client.properties in conf/ that must be configured as the examples will use this file to connect to your instance.

  3. Review env.sh.example and accumulo-env.sh (within your accumulo installation) to see if you need to customize them. If ACCUMULO_HOME & HADOOP_HOME are set in your shell, you may be able skip this step. Make sure ACCUMULO_CLIENT_PROPS is set to the location of your accumulo-client.properties.

     cp conf/env.sh.example conf/env.sh
     vim conf/env.sh
    
  4. Build the examples repo and copy the examples jar to Accumulo's lib/ directory to get on its class path:

     ./bin/build
     cp target/accumulo-examples.jar /path/to/accumulo/lib/
    
  5. Each Accumulo example has its own documentation and instructions for running the example which are linked to below.

When running the examples, remember the tips below:

Available Examples

Each example below highlights a feature of Apache Accumulo.

ExampleDescription
batchUsing the batch writer and batch scanner
bloomCreating a bloom filter enabled table to increase query performance
bulkIngestIngesting bulk data using map/reduce jobs on Hadoop
classpathUsing per-table classpaths
clientUsing table operations, reading and writing data in Java.
combinerUsing example StatsCombiner to find min, max, sum, and count.
compactionStrategyConfiguring a compaction strategy
constraintsUsing constraints with tables. Limit the mutation size to avoid running out of memory
deleteKeyValuePairDeleting a key/value pair and verifying the deletion in RFile.
dirlistStoring filesystem information.
exportExporting and importing tables.
filedataStoring file data.
filterUsing the AgeOffFilter to remove records more than 30 seconds old.
helloworldInserting records both inside map/reduce jobs and outside. And reading records between two rows.
isolationUsing the isolated scanner to ensure partial changes are not seen.
regexUsing MapReduce and Accumulo to find data using regular expressions.
reservationsUsing conditional mutations to implement simple reservation system.
rgbalancerUsing a balancer to spread groups of tablets within a table evenly
rowhashUsing MapReduce to read a table and write to a new column in the same table.
sampleBuilding and using sample data in Accumulo.
shardUsing the intersecting iterator with a term index partitioned by document.
sparkUsing Accumulo as input and output for Apache Spark jobs
tabletofileUsing MapReduce to read a table and write one of its columns to a file in HDFS.
terasortGenerating random data and sorting it using Accumulo.
tracingGenerating trace data in a client application and Accumulo.
uniquecolsUse MapReduce to count unique columns in Accumulo
visibilityUsing visibilities (or combinations of authorizations). Also shows user permissions.
wordcountUse MapReduce and Accumulo to do a word count on text files

Release Testing

This repository can be used to test Accumulo release candidates. See docs/release-testing.md.