Home

Awesome

Exhibitor Mesos Framework

Prerequisites

Typical Operations

Navigating the CLI

Prerequisites

Clone and build the project

# git clone https://github.com/elodina/exhibitor-mesos-framework.git
# cd exhibitor-mesos-framework
# ./gradlew jar

Build Exhibitor Standalone if necessary (NOTE: version built with Gradle may be affected by this issue so we use Maven build in this example):

# mkdir tmp-exhibitor && cd tmp-exhibitor
# wget https://raw.github.com/Netflix/exhibitor/master/exhibitor-standalone/src/main/resources/buildscripts/standalone/maven/pom.xml
# mvn clean package
# cp target/exhibitor-*.jar ..
# cd .. && rm -rf tmp-exhibitor

Download Apache Zookeeper distribution if you don't have one (or place the archive to the working folder):

# wget http://apache.cp.if.ua/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Download Oracle JDK distribution (or place the archive to the working folder). NOTE: please pay attention it MUST be Oracle JDK (not OpenJDK and not JRE) as Exhibitor relies on jps calls:

# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gz

Environment Configuration

Before running ./exhibitor-mesos.sh, set the location of libmesos:

# export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so

If the host running scheduler has several IP addresses you may also need to

# export LIBPROCESS_IP=<IP_ACCESSIBLE_FROM_MASTER>

Scheduler Configuration

The scheduler is configured through the command line.

Following options are available:

Usage: scheduler [options]

  -m <value> | --master <value>
        Mesos Master addresses. Required.
  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.
  -u <value> | --user <value>
        Mesos user. Required.
  --framework-name <value>
        Mesos framework name. Defaults to exhibitor. Optional
  --framework-timeout <value>
        Mesos framework failover timeout. Allows to recover from failure before killing running tasks. Should be a parsable Scala Duration value. Defaults to 30 days. Optional
  --storage <value>
        Storage for cluster state. Examples: file:exhibitor-mesos.json; zk:master:2181/exhibitor-mesos. Required.
  --ensemble-modify-retries <value>
        Number of retries to modify (add/remove server) ensemble. Defaults to 60. Optional.
  --ensemble-modify-backoff <value>
        Backoff between retries to modify (add/remove server) ensemble in milliseconds. Defaults to 1000. Optional.
  -d <value> | --debug <value>
        Debug mode. Optional. Defaults to false.

Run the scheduler

Start the Exhibitor scheduler using this command:

# ./exhibitor-mesos.sh scheduler --master master:5050 --user root --api http://master:6666

Quick start

In order not to pass the API url to each CLI call lets export the URL as follows:

# export EM_API=http://master:6666

First lets start 1 Exhibitor with the default settings. Further in the readme you can see how to change these from the defaults.

# ./exhibitor-mesos.sh add 0
Added servers 0

cluster:
  server:
    id: 0
    state: Added
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m
    exhibitor config:
    shared config overrides:
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto

You now have a cluster with 1 server that is not started.

# ./exhibitor-mesos.sh status
cluster:
  server:
    id: 0
    state: Added
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m
    exhibitor config:
    shared config overrides:
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto

Each server requires some basic configuration.

# ./exhibitor-mesos.sh config 0 --configtype zookeeper --zkconfigconnect 192.168.3.1:2181 --zkconfigzpath /exhibitor/config --zookeeper-install-directory /tmp/zookeeper --zookeeper-data-directory /tmp/zkdata
Updated configuration for servers 0

cluster:
  server:
    id: 0
    state: Added
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m
    exhibitor config:
      zkconfigzpath: /exhibitor/config
      zkconfigconnect: 192.168.3.1:2181
      configtype: zookeeper
    shared config overrides:
      zookeeper-install-directory: /tmp/zookeeper
      zookeeper-data-directory: /tmp/zkdata
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto

Now lets start the server. This call to CLI will block until the server is actually started but will wait no more than a configured timeout. Timeout can be passed via --timeout flag and defaults to 60s. If a timeout of 0ms is passed CLI won't wait for servers to start at all and will reply with "Scheduled servers ..." message.

# ./exhibitor-mesos.sh start 0 --timeout 30s
Started servers 0

cluster:
  server:
    id: 0
    state: Running
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m, hostname:slave1
    exhibitor config:
      zkconfigzpath: /exhibitor/config
      zkconfigconnect: 192.168.3.1:2181
      configtype: zookeeper
    shared config overrides:
      zookeeper-install-directory: /tmp/zookeeper
      zookeeper-data-directory: /tmp/zkdata
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto

Now as we don't know where the server we may ask for the cluster status to see where the endpoint is.

# ./exhibitor-mesos.sh status
cluster:
  server:
    id: 0
    state: Running
    endpoint: http://slave0:31000/exhibitor/v1/ui/index.html
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m, hostname:slave1
    exhibitor config:
      zkconfigzpath: /exhibitor/config
      zkconfigconnect: 192.168.3.1:2181
      port: 31000
      configtype: zookeeper
    shared config overrides:
      zookeeper-install-directory: /tmp/zookeeper
      zookeeper-data-directory: /tmp/zkdata
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto
    exhibitor cluster view:
          [slave0, latent, 0, F]

(NOTE: with exhibitor cluster view section you can reason about underlying Exhibitor and Zookeeper ensemble. Since there is some synchronisation lag in Exhibitor when the node is added/removed, the view of the cluster may be different from different nodes, that's why this section is shown under all nodes that are in the RUNNING state)

By now you should have a single Exhibitor instance running. Here's how you stop it:

# ./exhibitor-mesos.sh stop 0
Stopped servers 0

If you want to remove the server from the cluster completely you may skip stop step and call remove directly (this will call stop under the hood anyway):

./exhibitor-mesos.sh remove 0
Removed servers 0

Typical Operations

Changing the location of Zookeeper data

# ./exhibitor-mesos.sh stop 0
Stopped servers 0

# ./exhibitor-mesos.sh config 0 --zookeeper-data-directory /tmp/exhibitor_zkdata
Updated configuration for servers 0

cluster:
  server:
    id: 0
    state: Added
    constraints: hostname=unique
    failover: delay:1m, max-delay:10m, max-tries:2
    stickiness: period: 10m
    exhibitor config:
      zkconfigzpath: /exhibitor/config
      zkconfigconnect: 192.168.3.1:2181
      configtype: zookeeper
    shared config overrides:
      zookeeper-install-directory: /tmp/zookeeper
      zookeeper-data-directory: /tmp/exhibitor_zkdata
    cpu: 0.2
    mem: 256.0
    sharedConfigChangeBackoff: 10000
    port: auto

Shutting down framework

While the scheduler has a shutdown hook it doesn't actually finish the framework. To shutdown the framework completely (e.g. unregister it in Mesos) you may shoot a POST to /teardown specifying the framework id to shutdown:

# curl -d frameworkId=20150807-094500-84125888-5050-14187-0005 -X POST http://master:5050/teardown

Navigating the CLI

Requesting help

# ./exhibitor-mesos.sh help
Usage: <command>

Commands:
  help       - print this message.
  help [cmd] - print command-specific help.
  scheduler  - start scheduler.
  status     - print cluster status.
  add        - add servers to cluster.
  config     - configure servers in cluster.
  start      - start servers in cluster.
  stop       - stop servers in cluster.
  remove     - remove servers in cluster.

Adding servers to the cluster

# ./exhibitor-mesos.sh help add
Usage: add <id> [options]

  -c <value> | --cpu <value>
        CPUs for server. Optional.
  -m <value> | --mem <value>
        Memory for server. Optional.
  --constraints <value>
        Constraints (hostname=like:master,rack=like:1.*). See below. Defaults to 'hostname=unique'. Optional.
  -b <value> | --configchangebackoff <value>
        Backoff between checks whether the shared configuration changed in milliseconds. Defaults to 10000. Optional.
  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.
  --port <value>
        Port ranges to accept, when offer is issued. Optional
  --docker <value>
        Use Docker to run executor. Allows running multiple instances per host. Optional and defaults to false

constraint examples:
  like:slave0    - value equals 'slave0'
  unlike:slave0  - value is not equal to 'slave0'
  like:slave.*   - value starts with 'slave'
  unique         - all values are unique
  cluster        - all values are the same
  cluster:slave0 - value equals 'slave0'
  groupBy        - all values are the same
  groupBy:3      - all values are within 3 different groups

Configuring servers in the cluster

NOTE: this section is not final and some configurations may change.

# ../exhibitor-mesos.sh help config
Usage: config <id> [options]

  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.
  --stickiness-period <value>
        Stickiness period to preserve same node for Exhibitor server (5m, 10m, 1h).
  --failover-delay <value>
        Failover delay (10s, 5m, 3h).
  --failover-max-delay <value>
        Max failover delay. See failoverDelay.
  --failover-max-tries <value>
        Max failover tries. Default - none
  --configtype <value>
        Config type to use: s3 or zookeeper. Optional.
  --configcheckms <value>
        Period (ms) to check for shared config updates. Optional.
  --defaultconfig <value>
        Full path to a file that contains initial/default values for Exhibitor/ZooKeeper config values. The file is a standard property file. Optional.
  --headingtext <value>
        Extra text to display in UI header. Optional.
  --hostname <value>
        Hostname to use for this JVM. Optional.
  --jquerystyle <value>
        Styling used for the JQuery-based UI. Optional.
  --loglines <value>
        Max lines of logging to keep in memory for display. Default is 1000. Optional.
  --nodemodification <value>
        If true, the Explorer UI will allow nodes to be modified (use with caution). Default is true. Optional.
  --prefspath <value>
        Certain values (such as Control Panel values) are stored in a preferences file. By default, Preferences.userRoot() is used. Optional.
  --servo <value>
        true/false (default is false). If enabled, ZooKeeper will be queried once a minute for its state via the 'mntr' four letter word (this requires ZooKeeper 3.4.x+). Servo will be used to publish this data via JMX. Optional.
  --timeout <value>
        Connection timeout (ms) for ZK connections. Default is 30000. Optional.
  --s3credentials <value>
        Credentials to use for s3backup or s3config. Optional.
  --s3region <value>
        Region for S3 calls (e.g. "eu-west-1"). Optional.
  --s3config <value>
        The bucket name and key to store the config (s3credentials may be provided as well). Argument is [bucket name]:[key]. Optional.
  --s3configprefix <value>
        When using AWS S3 shared config files, the prefix to use for values such as locks. Optional.
  --zkconfigconnect <value>
        The initial connection string for ZooKeeper shared config storage. E.g: host1:2181,host2:2181... Optional.
  --zkconfigexhibitorpath <value>
        Used if the ZooKeeper shared config is also running Exhibitor. This is the URI path for the REST call. The default is: /. Optional.
  --zkconfigexhibitorport <value>
        Used if the ZooKeeper shared config is also running Exhibitor. This is the port that Exhibitor is listening on. IMPORTANT: if this value is not set it implies that Exhibitor is not being used on the ZooKeeper shared config. Optional.
  --zkconfigpollms <value>
        The period in ms to check for changes in the config ensemble. The default is: 10000. Optional.
  --zkconfigretry <value>
        The retry values to use in the form sleep-ms:retry-qty. The default is: 1000:3. Optional.
  --zkconfigzpath <value>
        The base ZPath that Exhibitor should use. E.g: /exhibitor/config. Optional.
  --filesystembackup <value>
        If true, enables file system backup of ZooKeeper log files. Optional.
  --s3backup <value>
        If true, enables AWS S3 backup of ZooKeeper log files (s3credentials may be provided as well). Optional.
  --aclid <value>
        Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's ID. Optional.
  --aclperms <value>
        Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's Permissions - a comma list of possible permissions. If this isn't specified the permission is set to ALL. Values: read, write, create, delete, admin. Optional.
  --aclscheme <value>
        Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's Scheme. Optional.
  --log-index-directory <value>
        The directory where indexed Zookeeper logs should be kept. Optional.
  --zookeeper-install-directory <value>
        The directory where the Zookeeper server is installed. Optional.
  --zookeeper-data-directory <value>
        The directory where Zookeeper snapshot data is stored. Optional.
  --zookeeper-log-directory <value>
        The directory where Zookeeper transaction log data is stored. Optional.
  --backup-extra <value>
        Backup extra shared config. Optional.
  --zoo-cfg-extra <value>
        Any additional properties to be added to the zoo.cfg file in form: key1\\=value1&key2\\=value2. Optional.
  --java-environment <value>
        Script to write as the 'java.env' file which gets executed as a part of Zookeeper start script. Optional.
  --log4j-properties <value>
        Contents of the log4j.properties file. Optional.
  --client-port <value>
        The port that clients use to connect to Zookeeper. Defaults to 2181. Optional.
  --connect-port <value>
        The port that other Zookeeper instances use to connect to Zookeeper. Defaults to 2888. Optional.
  --election-port <value>
        The port that other Zookeeper instances use for election. Defaults to 3888. Optional.
  --check-ms <value>
        The number of milliseconds between live-ness checks on Zookeeper server. Defaults to 30000. Optional.
  --cleanup-period-ms <value>
        The number of milliseconds between Zookeeper log file cleanups. Defaults to 43200000. Optional.
  --cleanup-max-files <value>
        The max number of Zookeeper log files to keep when cleaning up. Defaults to 3. Optional.
  --backup-max-store-ms <value>
        Backup max store ms shared config. Optional.
  --backup-period-ms <value>
        Backup period ms shared config. Optional.
  --port <value>
        Port ranges to accept, when offer is issued. Optional

Starting servers in the cluster

# ./exhibitor-mesos.sh help start
Usage: start <id> [options]

  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.

Stopping servers in the cluster

# ./exhibitor-mesos.sh help stop
Usage: stop <id> [options]

  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.

Removing servers from the cluster

# ./exhibitor-mesos.sh help remove
Usage: remove <id> [options]

  -a <value> | --api <value>
        Binding host:port for http/artifact server. Optional if EM_API env is set.

Open issues here.