I discovered this when trying to deploy some dedicated Redis servers in Windows Azure. Building single node instances was easy enough, but when trying to build a redundant pair, I ran into the issue where Redis doesn’t currently support a Master-Master type deployment, though they are planning to implement such features in the future.
After a bit of research, I found that Redis does support a Master-Slave deployment, and that an extension to Redis (Redis Sentinel) can be used to allow a slave to take on the role of the master should the original master fail. The final trick was to ensure that when the user or application calls the service they’ll always get through to the master node. Sadly, the only documentation I could find was either incomplete, had errors, or was too vague, I hope this guide will be a more complete solution.
Essentially, we’re going to build something similar to this:
The diagram above shows the link between the different applications rather than the servers themselves.
Preparing the environment
Firstly, we’re going to build a new Cloud Service in Windows Azure, in there we’re going to build three CentOS Linux virtual machines. Two of these will be medium sized (you can make these whatever size you think you’ll need) – these will host our Redis servers, Redis Sentinel, and HAProxy. The third server only needs to be an extra small server (because, lets face it, hosted VMs are expensive!) This will purely be used as an additional Redis Sentinel server, essentially a quorum server. On the two main servers, you’ll need to set up a load balanced endpoint on port 6379.For this guide, lets say the following servers have the following IPs:
Quorum: 10.0.0.2
Redis 01: 10.0.0.3
Redis 02: 10.0.0.4
Once Azure has finished provisioning the servers, log in and configure them for your environment.
Installing the software
Now we’re going to install Redis on all three boxes.For Redis Sentinel to function properly, the Redis project recommend using Redis version 2.8. This is not included in the CentOS repositories, but is found on the Remi repo. Do the following to install it.
1
2 3 |
wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
wget http://rpms.famillecollet.com/enterprise/remi-release-6.rpm rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm |
1
|
yum --enablerepo=remi install redis -y
|
When Redis has successfully been installed, you’ll also want to install HAProxy on to the two main boxes. We need to use features of HAProxy that are only available in version 1.5dev20 or above. This, as far as I’m aware, isn’t available in any repository, so I’ve compiled version 1.5dev22 (the latest version at the time of writing) and provided the packages here:
i386
i386-debuginfo
x86_64
x86_64-debuginfo
You may want to use an up to date version over my precompiled versions, if so, use the following guide:
A Recipe for a haproxy 1.5 development version (v22) RPM on CentOS
Install the RPM file(s) you’ve downloaded:
1
|
rpm -Uhv haproxy-1.5-dev22*.rpm
|
Configuration
Firstly, on both main boxes, we need to edit the /etc/redis.conf file.Change the port Redis listens on (default is 6379):
1
|
port 6380
|
1
|
# bind 127.0.0.1
|
1
|
slaveof 10.0.0.3 6380
|
1
2 |
service redis start
chkconfig redis on |
1
|
redis-cli -p 6380
|
1
|
info replication
|
# ReplicationOn the second, you’ll see something like this:
role:master
connected_slaves:1
slave0:ip=10.0.0.4,port=6380,state=online,offset=108786437,lag=0
master_repl_offset:108786437
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:107737862
repl_backlog_histlen:1048576
# ReplicationOnce you’ve confirmed these are running correctly, we can configure Redis Sentinel to allow for automatic failover.
role:slave
master_host:10.0.0.3
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:558173651
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
On all three boxes, edit the /etc/redis-sentinel.conf file and add the following:
1
2 3 4 5 6 7 8 9 10 11 12 |
port 26379
logfile "/var/log/redis/sentinel.log" sentinel monitor mymaster 10.0.0.3 6380 2 sentinel down-after-milliseconds mymaster 10000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1 sentinel monitor resque 10.0.0.4 6380 2 sentinel down-after-milliseconds resque 10000 sentinel failover-timeout resque 60000 sentinel parallel-syncs resque 2 |
1
2 |
service redis-sentinel start
chkconfig redis-sentinel on |
On both of the main boxes, edit the /etc/haproxy/haproxy.conf file.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
defaults REDIS
mode tcp timeout connect 4s timeout server 15s timeout client 15s frontend ft_redis bind *:6379 name redis default_backend bk_redis backend bk_redis option tcp-check tcp-check connect tcp-check send PING\r\n tcp-check expect string +PONG tcp-check send info\ replication\r\n tcp-check expect string role:master tcp-check send QUIT\r\n tcp-check expect string +OK server R1 10.0.0.3:6380 check inter 1s server R2 10.0.0.4:6380 check inter 1s |
The first section tells HAProxy that the connection will be TCP, to abort attempting to connect if it takes longer than 4 seconds, and to disconnect if the server or the client haven’t communicated in 15 seconds. This may need adjusting in the future.
The next section makes HAProxy listen on port 6379, which is Redis’s default port. From the outside, it will look like a normal Redis server. It is then told to use the backend named “bk_redis” to deal with the requests.
The final section is where the magic comes into play. We’ll be using the new features (from HAProxy 1.5dev20 and above) in the tcp-check command to determine which server is the master. Firstly we tell HAProxy to use the tcp-check option, and to attempt to connect with tcp. Next, we send a simple PING to the Redis server to see if it’s alive. The next line is to tell HAProxy what the expected reply is, in this case “+PONG”. Next, it sends the following string “info replication” and listens for the line “role:master”. Finally it sends the “QUIT” command to ensure the connection is closed properly. It will run this against both servers, and whichever one responds with “role:master” will be the one chosen.
Save the file and start HAProxy on both servers and set them to start at boot:
1
2 |
service haproxy start
chkconfig haproxy on |
The easiest way of doing this is to use Telnet. Run the following:
1
|
telnet your-redis-service.cloudapp.net 6379
|
Trying 11.22.33.44…If it appears to hang here, this is a good sign. Lets see if it responds. Start by typing PING followed by enter. You should get a response from Redis saying +PONG.
Connected to your-redis-service.cloudapp.net.
Escape character is ‘^]’.
Now for the moment of truth, type info replication followed by enter. This should return some information about how the server is backed up. The line we’re interested in is role:master, if this is present then we’re hitting the correct server. Now type QUIT followed by enter to close the connection. Repeat this a few times to make sure you’re always connecting to the master.
Testing failover
Now that it appears that Redis is functioning properly, lets break it! If this guide has been followed correctly, the first box should be the master, and the second one is the slave. Shut down the first box (yes, a full “shutdown -h now”), this will simulate what will happen when Microsoft update their platform and shut down parts of your cloud services whilst maintaining uptime.After about a minute, try telnetting into the Redis server again. This time when you run info replication it will still say role:master but there will be 0 slaves connected. This tells us that there is indeed only one Redis server running, and that the slave has successfully been promoted by the two remaining Redis Sentinel services.
Now from the Azure portal, start the first box again. After a couple of minutes try telnetting into the service again. This time when you run info replication it should still say role:master but this time, it will say that the machine 10.0.0.3 is its slave. This tells us that the second server has retained its master status, and now the original master is our failover machine.
Locking it down
Ok, now we’ve got Redis working with failover, we’re probably not going to want to leave it open to the world, so we’re going to lock it down with iptables. Log into your boxes, and open the /etc/sysconfig/iptables file. Something like this will do the job.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT ### Azure Load Balancer - allow all for port checking -A INPUT -s 168.63.129.16/32 -j ACCEPT # Allow the machines in the cloudapp -A INPUT -m iprange --src-range 10.0.0.2-10.0.0.4 -j ACCEPT # Offices -A INPUT -s 87.65.43.21/32 -m tcp -p tcp --dport 6379 -j ACCEPT # Application -A INPUT -s 55.66.77.88/32 -m tcp -p tcp --dport 6379 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT |
There are many good tutorials on iptables out there, this is just more of a guide.
Of course, once you’ve configured iptables, start it and set it to run at boot:
1
2 |
service iptables start
chkconfig iptables on |
Conclusion
So, hopefully this should be a definitive guide to deploying an high availability Redis service. As you can see, due to the lack of Master-Master support, other works arounds where needed, but so far this seems to work. Hopefully this will help anyone in a similar situation.Source: https://robertianhawdon.me.uk/2014/02/11/sysops-installing-a-high-availability-redis-service-on-centos-6-x-in-windows-azure/