High availability with keepalived
High availability (HA) keeps a system running even if some components fail. In an active-passive HA setup, two servers work together:
- The active server handles all requests.
 - The passive server stays on standby and takes over if the active server fails.
 
This guide shows how to configure HA for NGINX Instance Manager using keepalived. This setup includes:
- A virtual IP address (VIP)
 - A shared Network File System (NFS)
 - Automated health checks to detect failures and trigger failover
 
Before setting up high availability (HA) for NGINX Instance Manager, make sure you have:
- Two physical servers with NGINX Instance Manager installed
 - A reserved virtual IP address (VIP) that always points to the active instance
 - An NFS share that both servers can access
 - Permissions to manage IP addresses at the operating system level
 keepalivedinstalled on both servers
Some cloud platforms don’t allow direct IP management with keepalived. If you’re using a cloud environment, check whether it supports VIP assignment.
This HA setup has the following restrictions:
- This setup supports only two nodes — one active and one passive. Configurations with three or more nodes are not supported.
 - Active/active HA is not supported. This configuration works only in an active-passive setup.
 - Do not modify 
keepalived. Changes beyond what is documented may cause failures. - OpenID Connect (OIDC) authentication is not supported when NGINX Instance Manager is running in forward-proxy mode. OIDC is configured on the NGINX Plus layer and cannot pass authentication requests through a forward proxy.
 
A virtual IP address (VIP) ensures that users always connect to the active server. During failover, keepalived automatically moves the VIP from the primary to the secondary server.
- Choose an unused IP address in your network to serve as the VIP.
 - Ensure that the IP address does not conflict with existing devices.
 - Configure firewalls and security rules to allow traffic to and from the VIP.
 - Note the VIP address, as you will reference it in the 
keepalived.conffile. 
Replace <VIRTUAL_IP_ADDRESS> with this IP when configuring keepalived.
keepalived is a Linux tool that monitors system health and assigns a virtual IP (VIP) to the active server in an HA setup.
Install keepalived on both servers.
- 
For Debian-based systems (Ubuntu, Debian):
sudo apt update sudo apt install keepalived -y - 
For RHEL-based systems (CentOS, RHEL):
sudo yum install keepalived -y 
keepalived monitors specific services to determine if a node is operational. Update /etc/nms/scripts/nms-notify-keepalived.sh to include the services you want to monitor.
check_nms_services=(
  "clickhouse-server"
  "nginx"
  "nms-core"
  "nms-dpm"
  "nms-integrations"
  "nms-ingestion"
)Update nms.conf on both nodes when changing mode of operation If you switch between connected and disconnected modes, you must update /etc/nms/nms.conf on both the primary and secondary nodes ifnms-integrationsis included incheck_nms_services. NGINX Instance Manager runs in connected mode by default. For instructions on changing the mode, see the installation guide for disconnected environments.
Edit /etc/keepalived/keepalived.conf on both servers and replace the placeholders with your actual network details.
vrrp_script nms_check_keepalived {
    script "/etc/nms/scripts/nms-check-keepalived.sh"
    interval 10
    weight 10
}
vrrp_instance VI_28 {
    state MASTER   # Set to BACKUP on the secondary server
    interface <NETWORK_INTERFACE>   # Replace with the correct network interface
    priority 100
    virtual_router_id 251
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass <AUTH_PASSWORD>   # Replace with a secure password
    }
    virtual_ipaddress {
        <VIRTUAL_IP_ADDRESS>   # Replace with your reserved VIP
    }
    track_script {
        nms_check_keepalived
    }
    notify /etc/nms/scripts/nms-notify-keepalived.sh
}Replace:
<NETWORK_INTERFACE>with your actual network interface (for example,ens32).<AUTH_PASSWORD>with a secure authentication password.<VIRTUAL_IP_ADDRESS>with your reserved VIP.
Ensure the configuration is identical on both servers, except for the state value:
- Set 
MASTERon the primary server. - Set 
BACKUPon the secondary server. 
Restart keepalived to apply the configuration:
sudo systemctl restart keepalivedNGINX Instance Manager requires shared storage for configuration files and logs.
Replace <NFS_SERVER_IP> with the actual IP address of your NFS server in the following commands.
sudo mount -t nfs4 \
  -o rw,relatime,vers=4.2, \
     rsize=524288,wsize=524288,namlen=255, \
     hard,proto=tcp,timeo=600,retrans=2,sec=sys \
  <NFS_SERVER_IP>:/mnt/nfs_share/clickhouse \
  /var/lib/clickhouse
sudo mount -t nfs4 \
  -o rw,relatime,vers=4.2, \
     rsize=524288,wsize=524288,namlen=255, \
     hard,proto=tcp,timeo=600,retrans=2,sec=sys \
  <NFS_SERVER_IP>:/mnt/nfs_share/nms \
  /var/lib/nmsAdd the following lines to /etc/fstab on both servers, replacing <NFS_SERVER_IP> with your actual NFS server’s IP.
<NFS_SERVER_IP>:/mnt/nfs_share/clickhouse /var/lib/clickhouse nfs defaults 0 0
<NFS_SERVER_IP>:/mnt/nfs_share/nms /var/lib/nms nfs defaults 0 0Run these commands to confirm that the NFS mounts are working:
sudo mount -a
df -h
ls -lart /mnt/nfs_share/clickhouse
ls -lart /var/lib/nms
sudo ls -lart /var/lib/clickhouse
telnet <NFS_SERVER_IP> 2049
rpcinfo -p <NFS_SERVER_IP>
sudo showmount -e <NFS_SERVER_IP>
dmesg | grep nfsFailover can be tested by simulating a failure on the active server.
- 
Restart
keepalived:sudo systemctl restart keepalived - 
Stop a monitored service:
sudo systemctl stop clickhouse-server - 
Reboot the active server:
sudo reboot - 
Simulate a network failure by disconnecting the active server.
 
To check if the passive server has taken over, run the following command on the backup server:
ip a | grep <VIRTUAL_IP_ADDRESS>The VIP should now be assigned to the secondary server.
If failover does not work as expected, check the following:
- Ensure 
keepalivedis running:systemctl status keepalived - Check logs for errors:
journalctl -u keepalived --no-pager | tail -50 - Verify that NFS mount points are accessible:
df -h - Check the 
keepalivedconfiguration for syntax errors:cat /etc/keepalived/keepalived.conf 
For additional support, visit the F5 Support Portal.