Bug #4845
open
CARP preemption doesn't switch to backup where connectivity between systems is lost but not NIC link
Added by Chris Buechler over 9 years ago.
Updated over 9 years ago.
Description
Take a basic WAN and LAN setup, one CARP IP on each interface. If WAN's NIC loses link, the secondary system takes over master status on both CARP IPs, and the primary switches to backup. Instead of losing link, sever connectivity between the two while retaining the NIC's link. The secondary sees that and takes master status on all CARP IPs. But the primary doesn't switch to backup status, so you're left with dual master, and the resulting brokenness that entails.
This is the same behavior as FreeBSD 8.3, 10.1, 11, and OpenBSD 5.7, so just a general issue with CARP. Problematic especially in virtualization scenarios because the VM won't lose link when the hypervisor does, leaving the network broken upon loss of connectivity on one network.
We noticed this at one point back in 2012 or so and I swear we already had a ticket open but couldn't find it. It's referenced in the book in the HA chapter . I recall hitting it with a customer back then and they ended up scripting some commands in ESX to bring down the vswitch when the physical NIC lost link.
Outside of a non-CARP heartbeat/failure notification I'm not sure how we might solve this. From the perspective of both units, they are doing the right thing. It's a layer 2/hypervisor issue. There isn't any way for the primary to know for certain that it has a problem in this case.
If that's the case, you are right. The only way I can see this working is sending both sending their 'status' via the SYNC interface so that Secondary can tell Primary that it hasn't received a CARP packet in time so has taken over.
I assume if this issue was around in 2012, it isn't going to be fixed any time soon? So I've ordered some new servers to be physical pfsense machines.
This is the previous bug I found on the subject but the comment was that it was simply a config issue.
https://redmine.pfsense.org/issues/1248#note-1
That other ticket ended up not being related to this, it was a different issue. In that case the "link" was lost from the perspective of the VM but not all VIPs were moving.
In this case the problem could be worked around a couple different ways. The best way would be for the hypervisor hosts to have physical redundancy at the network level, with NICs going to multiple switches from both hypervisor units so that this can't happen.
Fixing this particular edge case is non-trivial and most real-world cases where it would be a problem have other redundancies that negate the issue, so the amount of work for covering this case is quite high compared to the benefit. We may yet come up with a proper solution, however.
The issue's been around since the inception of CARP in 2003, so yeah not likely this is going to change in the near future.
In this circumstance, CARP could see that its opposite system has preempted (though would have to assume it did so as things currently stand, from its advertisements) and back down. Though that opens up other possibilities for failure when it hasn't actually preempted yet is still advertising for some reason.
FreeVRRPD is something we're looking to as a possible future CARP replacement in v3.0, though it has the same behavior in this regard.
Something that bears more investigation. It's an unusual real world failure scenario outside the VM case.
Upon further testing, this issue seems to cause further problems described below when using certain switches that take about 30 seconds to negotiate before sending traffic.
PFSense HA setup with two switches on the WAN side, two PFSense machines and two switches on the LAN side.
When the primary WAN cable is disconnected, pings continue as expected and both WAN and IP CARP IPs are transferred to the backup pfsense machine.
When the primary WAN is reconnected, because CARP is looking at the state and not the connectivity, the LAN role is moved back to the primary, but the WAN role remains with the secondary for about 30 seconds. I assume this is due to the switch not sending traffic instantly.
PFSense should have it's own, hardcoded link between WAN and LAN to make it impossible that they move independantly.
That's still a switch configuration problem. Unless you have a bridge on pfSense involved, you should have "portfast" or equivalent on your end node switch ports so they do not waste time blocking on link up.
Also available in: Atom
PDF