Bug #16860
opengwlb: OpenVPN gateway monitor route uses gateway IP, breaks monitoring/failover when two tunnels share a subnet
0%
Description
Affects: 2.8.1-RELEASE
setup_gateways_monitor() in src/etc/inc/gwlb.inc adds each gateway's monitor /32 static route via the gateway IP for all non-PPP gateways (inet ~L597, inet6 ~L638). An OpenVPN tun gateway's IP lies inside the tunnel's own subnet, so the route resolves through whichever interface owns that connected subnet.
When two OpenVPN client gateways get addresses in the same subnet — common with commercial VPN providers that allocate tunnel IPs from a shared pool — both monitor routes bind to the single interface that owns the subnet. dpinger for the second gateway then probes through the wrong tunnel, so its reported loss/latency reflects a different tunnel. Gateway-group failover breaks: a dead primary can read "online" (its probe egresses a healthy sibling), or a healthy backup reads "down" with the primary and the group fails closed instead of over.
Steps to reproduce:
1. Two OpenVPN client gateways whose tunnel IPs fall in the same subnet
(e.g. both in 172.21.92.0/23).
2. Distinct monitor IP on each; place them in a failover gateway group.
3. `route -n get <gw2-monitor>` resolves to gw1's interface, not gw2's.
Expected: each monitor probe egresses its own tunnel.
Actual: the non-owning gateway's probe egresses the owning gateway's tunnel; killing the owning tunnel marks both gateways down and the group fails closed.
Fix: extend the existing interface-bound monitor-route branch (already used for PPP-type gateways) to OpenVPN tun interfaces — route_add_or_change(monitor, '', interface) pins egress to the correct interface regardless of subnet overlap. Verified on 2.8.1-RELEASE: each monitor route then binds to its own ovpncN, and a failover group correctly switches to the backup when the primary tunnel is killed.
Updated by Ivaylo Hubanov 1 day ago
Ivaylo Hubanov wrote:
Affects: 2.8.1-RELEASE
setup_gateways_monitor() in src/etc/inc/gwlb.inc adds each gateway's monitor /32 static route via the gateway IP for all non-PPP gateways (inet ~L597, inet6 ~L638). An OpenVPN tun gateway's IP lies inside the tunnel's own subnet, so the route resolves through whichever interface owns that connected subnet.
When two OpenVPN client gateways get addresses in the same subnet — common with commercial VPN providers that allocate tunnel IPs from a shared pool — both monitor routes bind to the single interface that owns the subnet. dpinger for the second gateway then probes through the wrong tunnel, so its reported loss/latency reflects a different tunnel. Gateway-group failover breaks: a dead primary can read "online" (its probe egresses a healthy sibling), or a healthy backup reads "down" with the primary and the group fails closed instead of over.
Steps to reproduce:
1. Two OpenVPN client gateways whose tunnel IPs fall in the same subnet
(e.g. both in 172.21.92.0/23).
2. Distinct monitor IP on each; place them in a failover gateway group.
3. `route -n get <gw2-monitor>` resolves to gw1's interface, not gw2's.Expected: each monitor probe egresses its own tunnel.
Actual: the non-owning gateway's probe egresses the owning gateway's tunnel; killing the owning tunnel marks both gateways down and the group fails closed.Fix: extend the existing interface-bound monitor-route branch (already used for PPP-type gateways) to OpenVPN tun interfaces — route_add_or_change(monitor, '', interface) pins egress to the correct interface regardless of subnet overlap. Verified on 2.8.1-RELEASE: each monitor route then binds to its own ovpncN, and a failover group correctly switches to the backup when the primary tunnel is killed.