Home

Awesome

Hadoop Ansible Playbook Build Status

Ansible playbook that installs a CDH 4.6.0 Hadoop cluster (running on Java 7, supported from CDH 4.4), with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.

Follow @analytically. Browse the CI build screenshots.

Requirements

Cloudera (CDH4) Hadoop Roles

If you're assembling your own Hadoop playbook, these roles are available for you to reuse:

Facebook Presto Roles

Configuration

Set the following variables using --extra-vars or editing group_vars/all:

Required:

Optional:

Adding hosts

Edit the hosts file and list hosts per group (see Inventory for more examples):

[datanodes]
hslave010
hslave[090:252]
hadoop-slave-[a:f].example.com

Make sure that the zookeepers and journalnodes groups contain at least 3 hosts and have an odd number of hosts.

Ganglia nodes

Since we're using unicast mode for Ganglia (which significantly reduces chatter), you may have to wait 60 seconds after node startup before it is seen/shows up in the web interface.

Installation

To run Ansible:

./site.sh

To e.g. just install ZooKeeper, add the zookeeper tag as argument (available tags: apache, bonding, configuration, elasticsearch, elasticsearch_curator, fluentd, ganglia, hadoop, hbase, hive, java, kibana, ntp, postfix, postgres, presto, rsyslog, tdagent, zookeeper):

./site.sh zookeeper

What else is installed?

URL's

After the installation, go here:

Performance testing

Instructions on how to test the performance of your CDH4 cluster.

TeraGen and TeraSort
DFSIO

Bootstrapping

Paste your public SSH RSA key in bootstrap/ansible_rsa.pub and run bootstrap.sh to bootstrap the nodes specified in bootstrap/hosts. See bootstrap/bootstrap.yml for more information.

What about Pig, Flume, etc?

You can manually install additional components after running this playbook. Follow the official CDH4 Installation Guide.

Screenshots

zookeeper

hmaster01

ganglia

kibana

smokeping

License

Licensed under the Apache License, Version 2.0.

Copyright 2013-2014 Mathias Bogaert.