Project

General

Profile

Actions

Bug #16221

open

Other interfaces are not demoted if a CARP interface uses DHCP resulting in split-brain operation

Added by Kris Phillips 4 days ago. Updated about 19 hours ago.

Status:
Pull Request Review
Priority:
Very High
Assignee:
Category:
CARP
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
Affected Architecture:
All

Description

Expected behavior:

When you unplug an interface and it goes from MASTER to INIT on a primary, all other interfaces should have their advskew advanced from 0 to 240 so that the secondary can take over.

This works normally for non-WAN interfaces. However, if the interface is a WAN, there is a hotplug event that seems to break advskew from staying at 240. It goes from 0 to 240 as expected, but then there is a "vhid removed" event after a hotplug notification and it reverts to 0. This results in a split brain situation where WAN will be MASTER on secondary, but inside interfaces will be BACKUP since the primary firewall doesn't properly give up MASTER role on these interfaces.

Here are the relevant log events:

May 30 21:45:59 kernel carp: 2@vtnet1: BACKUP -> MASTER (preempting a slower master)
May 30 21:45:59 check_reload_status 657 Carp master event
May 30 21:45:59 kernel vtnet0: promiscuous mode disabled
May 30 21:45:59 kernel carp: demoted by -240 to 0 (vhid removed)
May 30 21:45:58 php-fpm 604 /rc.carpbackup: HA cluster member "(172.21.92.4@vtnet0): (WAN)" has resumed CARP state "BACKUP" for vhid 1
May 30 21:45:58 php-fpm 11539 /rc.linkup: DEVD Ethernet detached event for wan
May 30 21:45:58 php-fpm 11539 /rc.linkup: Hotplug event detected for WAN dynamic IP address (4: 172.21.92.2, 6: dhcp6)
May 30 21:45:58 php-fpm 30767 /rc.carpbackup: HA cluster member "(192.168.1.1@vtnet1): (LAN)" has resumed CARP state "BACKUP" for vhid 2
May 30 21:45:58 kernel carp: 2@vtnet1: MASTER -> BACKUP (more frequent advertisement received)
May 30 21:45:58 kernel vtnet0: link state changed to DOWN
May 30 21:45:58 kernel carp: demoted by 240 to 240 (interface down)
May 30 21:45:58 kernel carp: 1@vtnet0: MASTER -> INIT (hardware interface down)

Attached are pictures of the Status --> CARP Page.

PrimaryWANInterfaceUnplugged-NotWorking.png - This is a screenshot of the CARP status on the Primary when the primary's WAN is unplugged and in a split-brain state
SecondaryWithPrimaryWANUnplugged-NotWorking.png - This is a screenshot of the CARP status on the Secondary when the primary's WAN is unplugged and in a split-brain state
PrimaryLANInterfaceUnplugged-Working.png - This is a normal and expected behavior on the Primary when the LAN is disconnected
SecondaryWithPrimaryLANUnpluged-Working.png - This is the same normal and expected behavior on the Secondary when the LAN is disconnected on Primary

Tested on 24.11 and 25.03. The behavior is the same on both versions.


Files

Actions #1

Updated by Christopher Cope 3 days ago

  • Status changed from New to Confirmed

I can confirm this on

24.11-RELEASE (amd64)
built on Fri Nov 22 4:34:00 UTC 2024
FreeBSD 15.0-CURRENT

Sometimes the WAN will change to INIT and sometimes it appears to lose the CARP VIP altogether, but the end result of split brain is the same.

One of the biggest things of note I was able to see when testing is that this only affects the original WAN interface that cannot be removed. I assume some code is hardcoded to do something with this interface and is causing issues.

In my case, I assigned WAN to an unused interface, set the address to none, and disabled it. I then configured OPT1 as the actual in use WAN and everything worked as expected with no split brain on the CARP.

Actions #2

Updated by Marcos M about 19 hours ago

  • Tracker changed from Regression to Bug
  • Status changed from Confirmed to Pull Request Review
  • Assignee set to Marcos M
  • Affected Plus Version changed from 25.03 to 23.01

This has been an issue at least since 23.01. The issue was not reproducible for the second WAN on #note-1 because the issue requires the interface to have DHCP for either IPv4 or IPv6; if both are static or only one is configured and static, the issue does not occur.

The following patch should fix the issue: Show

https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1224

Actions #3

Updated by Marcos M about 19 hours ago

  • Project changed from pfSense Plus to pfSense
  • Subject changed from CARP demotion of other interfaces for advskew broken for WAN interfaces causing split brain operation to Other interfaces are not demoted if a CARP interface uses DHCP resulting in split-brain operation
  • Category changed from CARP to CARP
  • Affected Plus Version deleted (23.01)
Actions

Also available in: Atom PDF