Awesome

Overview

Testing setup and docs for using Redis Sentinel (http://redis.io/topics/sentinel) with Jedis java client proxying through an F5 Big-IP load balancer.

Setup

Ensure you have the very latest Redis master compiled and installed.

Clone this repo, then grab some dependencies:

cd redis-sentinel-tests
wget https://github.com/downloads/xetorthio/jedis/jedis-2.1.0.jar
wget http://www.fightrice.com/mirrors/apache//commons/pool/binaries/commons-pool-1.6-bin.tar.gz
tar -zxf commons-pool-1.6-bin.tar.gz commons-pool-1.6/commons-pool-1.6.jar
mv commons-pool-1.6/commons-pool-1.6.jar ./
rm commons-pool-1.6-bin.tar.gz
rmdir commons-pool-1.6

Open a bunch of shells and fire up redis instances:

redis-server redis-master.conf
redis-server redis-slave1.conf

redis-sentinel sent1.conf
redis-sentinel sent2.conf
redis-sentinel sent3.conf

You should see the slave connect to the master and sync, and the sentinels connect to the master, discover one another, and find the slave.

I've included a sample F5 load-balancer config for this setup. It works well as long as you don't accidentally bring up 2 instances in the master role.

There's also a sample haproxy config, but HAProxy lacks the ability to do a custom TCP req/reply health check, so it does not properly.

Run some tests

Now we can run our test client, connecting to the load-balancer virtual IP:

export CLASSPATH=$CLASSPATH:commons-pool-1.6.jar:jedis-2.1.0.jar
javac Test.java 
java Test myvirtualip 100

The second argument is the delay in ms between redis write commands. That will tick happily along and now you can start creating mayhem.

Kill the master (kill -9 or ctrl-c)
Watch sentinel recognize it and promote the slave to master
Watch your load balancer detect the failed master and the newly "up" slave
Watch the test code spit out x's for a while during the transition, indicating failed redis commands
Watch the test code recover and output a line about how long it was blocked and total failed commands

Now, edit the redis-master.conf, set it to slaveof the new master (slave1, port 6380) and restart the master instance.

Watch as the sentinels detect the new slave
Kill the new master (port 6380)
Watch sentinels elect the original master again
Watch client reconnect
Declare victory!

TODO/Issues

Killing both instances can result in a state that sentinels will never recover from
What's a safe setting for down-after-milliseconds? will setting this too low cause false positives when there are slow operations (e.g. big sort)?
(fixed) starting a downed master back up w/o reconfig'ing as slave results in sentinel confusion
Since the LB is only checking "role:master" on each instance, a poorly config'd instance can result in 2 masters being "up"