Over the last couple of months I decided to finally really learn docker (and eventually kubernetes). I started by converting my home network services to docker via docker-compose then quickly figured out that docker swarm is where I needed to be since this is where I could get some decent fault-tolerance. I got that running experiences some instability with docker swarm mode where, for some reason, on some nodes I couldn’t access my services using docker swarm mesh at some times. This posed a problem mainly because me port forwarding from my router was pointing to a single IP. I need to “fix” that.
This is where I figured I need some sort of HA and keepalived seemed to been the perfect solution. With this I would be able to have 2 nodes share a VIP and if one goes down it would failover to the other. However unlike most other configs that I found on the web I didn’t just want an IP failover because when my server have it’s issues I could still connect to the server via ssh but the services I was publishing from docker was not available. Because of that I wanted some to monitor the service, not just the IP.
The following config will setup keepalived (master & slave) on on 2 nodes, monitoring the HTTP port on 2 nodes in the cluster using netcat (nc). If it fails it will failover to the other node. Before I share the actual configs I have give credit to the many sites through which I was able to sew bits a pieced to together to make this possible. This article is hopefully a single place where a fully working solution can reside.
Firstly, I create a private image (I might publish it if I feel it could help others). My base image is from https://github.com/angelnu/docker-keepalived. However there are a few things in that image that didn’t work for me. One, it was based on arm while I was working with a standard server, not raspberry pi. Next I wanted to use netcat to monitor for TCP ports and that was not in there image. Based on that I copied their Dockerfile and modified it to look like this
ARG BASE=alpine
FROM $BASE
RUN echo "http://nl.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories && \
apk --update -t add keepalived iproute2 grep bash tcpdump sed perl netcat-openbsd && \
rm -f /var/cache/apk/* /tmp/*
COPY run.sh /run.sh
COPY keepalived.conf /etc/keepalived/keepalived.conf
ENTRYPOINT [ "/run.sh" ]
I removed the lines relating to arm architecture and added “netcat”. After we get that out of the way we can now create our own custom image to be used in our containers.
Now using Docker Swarm mode we create 2 services, 1 restricted to the manager node and the other to another node (I thought it best that it runs on a worker). To accomplish that you’ll need the following files.
dkr-01:~$ cat keepalived/docker-compose.yml
version: "3.7"
services:
keepalived-master:
image: repo-01:5000/keepha:latest
cap_add:
- NET_ADMIN
- NET_RAW
- NET_BROADCAST
volumes:
- ./keepalived-master:/etc/keepalived/
networks:
- host
deploy:
replicas: 1
placement:
constraints: [node.role == manager]
keepalived-slave:
image: repo-01:5000/keepha:latest
cap_add:
- NET_ADMIN
- NET_RAW
- NET_BROADCAST
volumes:
- ./keepalived-slave:/etc/keepalived/
networks:
- host
deploy:
replicas: 1
placement:
constraints: [node.role == worker]
networks:
host:
external: true
The important piece in the block above which seem to me missing/not needed when implementing keepalived on the base OS instead of in docker is the “cap_add”. These are linux level capabilities which docker doesn’t normally have access to. (Reference) I almost gave up trying to get this service running in dockerized fashion until I made this “one last try” and added that block. It seems that without this section the service can’t allocate the VIP to the service.
The other thing I added, as mentioned before, was the port check in the form of a “script” embedded in the config files on each node.
dkr-01:~$ cat keepalived/keepalived-master/keepalived.conf
vrrp_script keepalived_check {
script "nc -zvw1 node-01 80"
interval 5
timeout 5
rise 3
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.14.24
}
track_script {
keepalived_check
}
}
dkr-01:~$ cat keepalived/keepalived-slave/keepalived.conf
vrrp_script keepalived_check {
script "nc -zvw1 node-02 80"
interval 5
timeout 5
rise 3
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 12345
}
virtual_ipaddress {
192.168.14.24
}
track_script {
keepalived_check
}
}