Awesome
Overview
Testing setup and docs for using Redis Sentinel (http://redis.io/topics/sentinel) with Jedis java client proxying through an F5 Big-IP load balancer.
Setup
Ensure you have the very latest Redis master compiled and installed.
Clone this repo, then grab some dependencies:
cd redis-sentinel-tests
wget https://github.com/downloads/xetorthio/jedis/jedis-2.1.0.jar
wget http://www.fightrice.com/mirrors/apache//commons/pool/binaries/commons-pool-1.6-bin.tar.gz
tar -zxf commons-pool-1.6-bin.tar.gz commons-pool-1.6/commons-pool-1.6.jar
mv commons-pool-1.6/commons-pool-1.6.jar ./
rm commons-pool-1.6-bin.tar.gz
rmdir commons-pool-1.6
Open a bunch of shells and fire up redis instances:
redis-server redis-master.conf
redis-server redis-slave1.conf
redis-sentinel sent1.conf
redis-sentinel sent2.conf
redis-sentinel sent3.conf
You should see the slave connect to the master and sync, and the sentinels connect to the master, discover one another, and find the slave.
I've included a sample F5 load-balancer config for this setup. It works well as long as you don't accidentally bring up 2 instances in the master role.
There's also a sample haproxy config, but HAProxy lacks the ability to do a custom TCP req/reply health check, so it does not properly.
Run some tests
Now we can run our test client, connecting to the load-balancer virtual IP:
export CLASSPATH=$CLASSPATH:commons-pool-1.6.jar:jedis-2.1.0.jar
javac Test.java
java Test myvirtualip 100
The second argument is the delay in ms between redis write commands. That will tick happily along and now you can start creating mayhem.
- Kill the master (kill -9 or ctrl-c)
- Watch sentinel recognize it and promote the slave to master
- Watch your load balancer detect the failed master and the newly "up" slave
- Watch the test code spit out x's for a while during the transition, indicating failed redis commands
- Watch the test code recover and output a line about how long it was blocked and total failed commands
Now, edit the redis-master.conf, set it to slaveof the new master (slave1, port 6380) and restart the master instance.
- Watch as the sentinels detect the new slave
- Kill the new master (port 6380)
- Watch sentinels elect the original master again
- Watch client reconnect
- Declare victory!
TODO/Issues
- Killing both instances can result in a state that sentinels will never recover from
- What's a safe setting for down-after-milliseconds? will setting this too low cause false positives when there are slow operations (e.g. big sort)?
- (fixed) starting a downed master back up w/o reconfig'ing as slave results in sentinel confusion
- Since the LB is only checking "role:master" on each instance, a poorly config'd instance can result in 2 masters being "up"