High availability with keepalived
High availability (HA) keeps a system running even if some components fail. In an active-passive HA setup, two servers work together:
- The active server handles all requests.
- The passive server stays on standby and takes over if the active server fails.
This guide shows how to configure HA for NGINX Instance Manager using keepalived. This setup includes:
- A virtual IP address (VIP)
- A shared Network File System (NFS)
- Automated health checks to detect failures and trigger failover
Before setting up high availability (HA) for NGINX Instance Manager, make sure you have:
- Two physical servers with NGINX Instance Manager installed
- A reserved virtual IP address (VIP) that always points to the active instance
- An NFS share that both servers can access
- Permissions to manage IP addresses at the operating system level
- keepalivedinstalled on both servers
Some cloud platforms don’t allow direct IP management with keepalived. If you’re using a cloud environment, check whether it supports VIP assignment.
This HA setup has the following restrictions:
- This setup supports only two nodes — one active and one passive. Configurations with three or more nodes are not supported.
- Active/active HA is not supported. This configuration works only in an active-passive setup.
- Do not modify keepalived. Changes beyond what is documented may cause failures.
- OpenID Connect (OIDC) authentication is not supported when NGINX Instance Manager is running in forward-proxy mode. OIDC is configured on the NGINX Plus layer and cannot pass authentication requests through a forward proxy.
A virtual IP address (VIP) ensures that users always connect to the active server. During failover, keepalived automatically moves the VIP from the primary to the secondary server.
- Choose an unused IP address in your network to serve as the VIP.
- Ensure that the IP address does not conflict with existing devices.
- Configure firewalls and security rules to allow traffic to and from the VIP.
- Note the VIP address, as you will reference it in the keepalived.conffile.
Replace <VIRTUAL_IP_ADDRESS> with this IP when configuring keepalived.
keepalived is a Linux tool that monitors system health and assigns a virtual IP (VIP) to the active server in an HA setup.
Install keepalived on both servers.
- 
For Debian-based systems (Ubuntu, Debian): sudo apt update sudo apt install keepalived -y
- 
For RHEL-based systems (CentOS, RHEL): sudo yum install keepalived -y
keepalived monitors specific services to determine if a node is operational. Update /etc/nms/scripts/nms-notify-keepalived.sh to include the services you want to monitor.
check_nms_services=(
  "clickhouse-server"
  "nginx"
  "nms-core"
  "nms-dpm"
  "nms-integrations"
  "nms-ingestion"
)Update nms.conf on both nodes when changing mode of operation If you switch between connected and disconnected modes, you must update /etc/nms/nms.conf on both the primary and secondary nodes ifnms-integrationsis included incheck_nms_services. NGINX Instance Manager runs in connected mode by default. For instructions on changing the mode, see the installation guide for disconnected environments.
Edit /etc/keepalived/keepalived.conf on both servers and replace the placeholders with your actual network details.
vrrp_script nms_check_keepalived {
    script "/etc/nms/scripts/nms-check-keepalived.sh"
    interval 10
    weight 10
}
vrrp_instance VI_28 {
    state MASTER   # Set to BACKUP on the secondary server
    interface <NETWORK_INTERFACE>   # Replace with the correct network interface
    priority 100
    virtual_router_id 251
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass <AUTH_PASSWORD>   # Replace with a secure password
    }
    virtual_ipaddress {
        <VIRTUAL_IP_ADDRESS>   # Replace with your reserved VIP
    }
    track_script {
        nms_check_keepalived
    }
    notify /etc/nms/scripts/nms-notify-keepalived.sh
}Replace:
- <NETWORK_INTERFACE>with your actual network interface (for example,- ens32).
- <AUTH_PASSWORD>with a secure authentication password.
- <VIRTUAL_IP_ADDRESS>with your reserved VIP.
Ensure the configuration is identical on both servers, except for the state value:
- Set MASTERon the primary server.
- Set BACKUPon the secondary server.
Restart keepalived to apply the configuration:
sudo systemctl restart keepalivedNGINX Instance Manager requires shared storage for configuration files and logs.
Replace <NFS_SERVER_IP> with the actual IP address of your NFS server in the following commands.
sudo mount -t nfs4 \
  -o rw,relatime,vers=4.2, \
     rsize=524288,wsize=524288,namlen=255, \
     hard,proto=tcp,timeo=600,retrans=2,sec=sys \
  <NFS_SERVER_IP>:/mnt/nfs_share/clickhouse \
  /var/lib/clickhouse
sudo mount -t nfs4 \
  -o rw,relatime,vers=4.2, \
     rsize=524288,wsize=524288,namlen=255, \
     hard,proto=tcp,timeo=600,retrans=2,sec=sys \
  <NFS_SERVER_IP>:/mnt/nfs_share/nms \
  /var/lib/nmsAdd the following lines to /etc/fstab on both servers, replacing <NFS_SERVER_IP> with your actual NFS server’s IP.
<NFS_SERVER_IP>:/mnt/nfs_share/clickhouse /var/lib/clickhouse nfs defaults 0 0
<NFS_SERVER_IP>:/mnt/nfs_share/nms /var/lib/nms nfs defaults 0 0Run these commands to confirm that the NFS mounts are working:
sudo mount -a
df -h
ls -lart /mnt/nfs_share/clickhouse
ls -lart /var/lib/nms
sudo ls -lart /var/lib/clickhouse
telnet <NFS_SERVER_IP> 2049
rpcinfo -p <NFS_SERVER_IP>
sudo showmount -e <NFS_SERVER_IP>
dmesg | grep nfsFailover can be tested by simulating a failure on the active server.
- 
Restart keepalived:sudo systemctl restart keepalived
- 
Stop a monitored service: sudo systemctl stop clickhouse-server
- 
Reboot the active server: sudo reboot
- 
Simulate a network failure by disconnecting the active server. 
To check if the passive server has taken over, run the following command on the backup server:
ip a | grep <VIRTUAL_IP_ADDRESS>The VIP should now be assigned to the secondary server.
If failover does not work as expected, check the following:
- Ensure keepalivedis running:systemctl status keepalived
- Check logs for errors:
journalctl -u keepalived --no-pager | tail -50
- Verify that NFS mount points are accessible:
df -h
- Check the keepalivedconfiguration for syntax errors:cat /etc/keepalived/keepalived.conf
For additional support, visit the F5 Support Portal.