Regression #16221
openCARP demotion of other interfaces for advskew broken for WAN interfaces causing split brain operation
0%
Description
Expected behavior:
When you unplug an interface and it goes from MASTER to INIT on a primary, all other interfaces should have their advskew advanced from 0 to 240 so that the secondary can take over.
This works normally for non-WAN interfaces. However, if the interface is a WAN, there is a hotplug event that seems to break advskew from staying at 240. It goes from 0 to 240 as expected, but then there is a "vhid removed" event after a hotplug notification and it reverts to 0. This results in a split brain situation where WAN will be MASTER on secondary, but inside interfaces will be BACKUP since the primary firewall doesn't properly give up MASTER role on these interfaces.
Here are the relevant log events:
May 30 21:45:59 kernel carp: 2@vtnet1: BACKUP -> MASTER (preempting a slower master)
May 30 21:45:59 check_reload_status 657 Carp master event
May 30 21:45:59 kernel vtnet0: promiscuous mode disabled
May 30 21:45:59 kernel carp: demoted by -240 to 0 (vhid removed)
May 30 21:45:58 php-fpm 604 /rc.carpbackup: HA cluster member "(172.21.92.4@vtnet0): (WAN)" has resumed CARP state "BACKUP" for vhid 1
May 30 21:45:58 php-fpm 11539 /rc.linkup: DEVD Ethernet detached event for wan
May 30 21:45:58 php-fpm 11539 /rc.linkup: Hotplug event detected for WAN dynamic IP address (4: 172.21.92.2, 6: dhcp6)
May 30 21:45:58 php-fpm 30767 /rc.carpbackup: HA cluster member "(192.168.1.1@vtnet1): (LAN)" has resumed CARP state "BACKUP" for vhid 2
May 30 21:45:58 kernel carp: 2@vtnet1: MASTER -> BACKUP (more frequent advertisement received)
May 30 21:45:58 kernel vtnet0: link state changed to DOWN
May 30 21:45:58 kernel carp: demoted by 240 to 240 (interface down)
May 30 21:45:58 kernel carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
Attached are pictures of the Status --> CARP Page.
PrimaryWANInterfaceUnplugged-NotWorking.png - This is a screenshot of the CARP status on the Primary when the primary's WAN is unplugged and in a split-brain state
SecondaryWithPrimaryWANUnplugged-NotWorking.png - This is a screenshot of the CARP status on the Secondary when the primary's WAN is unplugged and in a split-brain state
PrimaryLANInterfaceUnplugged-Working.png - This is a normal and expected behavior on the Primary when the LAN is disconnected
SecondaryWithPrimaryLANUnpluged-Working.png - This is the same normal and expected behavior on the Secondary when the LAN is disconnected on Primary
Tested on 24.11 and 25.03. The behavior is the same on both versions.
Files
Updated by Christopher Cope 1 day ago
- Status changed from New to Confirmed
I can confirm this on
24.11-RELEASE (amd64) built on Fri Nov 22 4:34:00 UTC 2024 FreeBSD 15.0-CURRENT
Sometimes the WAN will change to INIT and sometimes it appears to lose the CARP VIP altogether, but the end result of split brain is the same.
One of the biggest things of note I was able to see when testing is that this only affects the original WAN interface that cannot be removed. I assume some code is hardcoded to do something with this interface and is causing issues.
In my case, I assigned WAN to an unused interface, set the address to none, and disabled it. I then configured OPT1 as the actual in use WAN and everything worked as expected with no split brain on the CARP.