Bug #2218
closedCARP VIPs can become master too early at boot time
100%
Description
On some systems that have packages installed, the package start time is long enough that it leaves certain services down for up to a minute during boot time because CARP switches back to master on the primary too early. Would be nice to have an option to delay CARP taking master status (but still having the CARP IPs there so services can bind to them) for a defined period of time.
Updated by Chris Buechler about 9 years ago
- Assignee deleted (
Ermal Luçi) - Target version deleted (
2.3)
Updated by Jim Pingle over 7 years ago
It's a non-issue if you put a node into maintenance mode from Status > CARP before updating or rebooting.
Updated by Louis Hather over 7 years ago
Sure, but I don't reboot my firewalls - they crash. See the issue?
Updated by Jim Pingle over 7 years ago
Then focus on fixing the source of the crashes if they happen that often -- The avoidable cases are already avoidable.
Updated by Louis Hather over 7 years ago
While true, it'll still fail at some point. I'm not sure this can be reasonably described as a non-issue with such severe service interruption. We're talking no packet movement for 30-90 seconds. I assume configuring pfSense to not fail back could be done, and that would be acceptable, but if I do that I'd simply be working around the issue.
Updated by Seb A over 7 years ago
Jim, what about if you have a power failure on the master firewall (and you have each firewall connected to different power, like we do): it goes down, backup takes over, power restored, old master comes up, takes over, and you lose all connectivity for over 1 minute. I'm sorry, but this is not a 'non-issue' - this is supposed to be a highly available system, so I think you should take another look at this.
Updated by Jim Pingle over 7 years ago
I didn't close the ticket and say it wouldn't be addressed eventually. When this old ticket was opened, maintenance mode did not exist and the most common cases (upgrades and manual reboots) are easily addressed with maintenance mode. That was not noted on this ticket, so anyone searching the problem and finding this ticket may not realize it was possible.
Updated by Black BlackBinary about 6 years ago
Hi there, is see the point of Seb A A,. I am prepare two pfsense with CARP IPs for our data center. We made some tests: the Fail-over in a unexpected hardware failure (simulated by pull the power plug) works great. the Slave-pfsense takes over in 1-10 sec with everything that is needed. IPsec, DNS, HAproxy, OpenVPN, routing and more. But! if i boot the master-pfsense again, the device never expected the failure and therefore is not in CARP-maintinance. the CARP-Feature is one of the first things that is loaded at boot and the Interface makes an instand-fail-back without even being ready. THIS fail-back creates a service Down for 1-2 minutes.
i would prefere:- optional: a pfsense-box is in CARP Maintinace by default (and manually reactivated)
- optional: the failback of CARP can be delayed : maintinace by default + reenable X secundes after boot
Updated by Greg Harris about 5 years ago
I agree with Black BlackBinary. The second optional should be the normal operation. A reboot should automatically trigger a maintenance mode, but it should always make sure that PFSense is operational before disabling maintenance mode and then begin taking over CARP interfaces. Otherwise, sessions and connections are being lost. A variable of X seconds after reboot would be a nicety.
Updated by Le Cygne over 3 years ago
Jim Pingle wrote:
I didn't close the ticket and say it wouldn't be addressed eventually. When this old ticket was opened, maintenance mode did not exist and the most common cases (upgrades and manual reboots) are easily addressed with maintenance mode. That was not noted on this ticket, so anyone searching the problem and finding this ticket may not realize it was possible.
Hello Jim! So you are still not convinced that this is an issue and that is why you deleted my post! This issue has been reported since 6 years back and still there is no fix for it!
Updated by Jim Pingle over 3 years ago
Read the text you quoted again. Eventually a better solution may come along. It's entirely mitigated by maintenance mode, which is the proper procedure for performing an upgrade, so the actual real-world impact is minimal.
If someone else wants to research and propose a solution and submit a pull request with the code required to implement it, we'd look it over and consider it, but currently it is not a priority for using our internal resources to solve.
Your issue was closed as a duplicate since it was identical to this, and you even noted that it behaved properly when you used maintenance mode.
Updated by Le Cygne over 3 years ago
Jim Pingle wrote:
Read the text you quoted again. Eventually a better solution may come along. It's entirely mitigated by maintenance mode, which is the proper procedure for performing an upgrade, so the actual real-world impact is minimal.
If someone else wants to research and propose a solution and submit a pull request with the code required to implement it, we'd look it over and consider it, but currently it is not a priority for using our internal resources to solve.
Your issue was closed as a duplicate since it was identical to this, and you even noted that it behaved properly when you used maintenance mode.
One question: Why do you keep ignoring the main point? It's NOT NOT NOT about doing an upgrade or a regular reboot. We know what we have to do in those cases to ensure smooth transition between the master & backup boxes. BUT what about "sudden disconnection" such as power failure. Do you think the "maintenance mode" does any good in that case?! Think again
Updated by Steve Wheeler over 1 year ago
- Tracker changed from Feature to Bug
- Subject changed from Ability to delay CARP master status at boot time to CARP VIPs can become master too early at boot time
- Status changed from New to Confirmed
- Priority changed from Normal to High
- Target version set to 2.7.0
- Plus Target Version set to Plus-Next
- Affected Architecture All added
Updated by Andreas Pross over 1 year ago
I already have a working implementation to delay CARP at bootup. I just pushed it to github.
It starts carp in maintenance mode during bootup. If no Master is available, it will become the master as usual. Otherwise it will stay in backup state.
After boot sequence, the carp maintenance mode ends automatically.
https://github.com/styletronix/pfsense/tree/CARP_DELAY
It may need some testing if it's working on different systems, but it should be easy to apply and remove it as patch in the development branch.
Updated by Marcos M over 1 year ago
- Status changed from Confirmed to Pull Request Review
- Assignee set to Reid Linnemann
Thanks for the contribution. There's already a merge request being reviewed internally for this issue:
https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1033
Updated by Reid Linnemann over 1 year ago
I thought I had responded to this ticket but I must have gotten distracted before I hit submit.
I have changes that should address this issue. Currently, net.inet.carp.allow is toggled on during interface configuration immediately when configuring a CARP VIP. I'm proposing to change this behavior to leave it toggled off while the system is still booting, specifically toggling it on at the tail end of rc.bootup. We can also do the inverse with the shutdown behavior and disable CARP before shutting down services, ensuring that redundancy failover happens prior to services being shut down. This will be introduced to devel and made available as a System Patches patch after the 23.05 release cycle completes.
Updated by Reid Linnemann over 1 year ago
- Status changed from Pull Request Review to Feedback
- % Done changed from 0 to 100
Applied in changeset 62fb07c8163b1cf8731d944fe958071f73f43ef8.
Updated by Reid Linnemann over 1 year ago
I had some stale edits in the commit referenced above, as of 5e92d678f642277642acb7f471cd430ed53aae16 these should be fixed.
Updated by Reid Linnemann over 1 year ago
- Plus Target Version changed from Plus-Next to 23.09
Updated by Vladimir Suhhanov over 1 year ago
Reid Linnemann wrote in #note-21:
I had some stale edits in the commit referenced above, as of 5e92d678f642277642acb7f471cd430ed53aae16 these should be fixed.
Do you have patch for 23.05 with all recommended patches applied?
Updated by Vladimir Suhhanov over 1 year ago
Never mind, just applied it in sequence, 62fb07c8163b1cf8731d944fe958071f73f43ef8 and 5e92d678f642277642acb7f471cd430ed53aae16, looks fine now
Updated by Marcos M over 1 year ago
- Status changed from Feedback to Resolved
Tested on 23.05 - no issues.
Updated by Reid Linnemann over 1 year ago
- Plus Target Version changed from 23.09 to 23.05.1
Bringing in to 23.05.1