Bug #11290
closedPackage ``<plugins>`` and ``<tabs>`` content missing from configuration in some cases
100%
Description
FRR 1.0.0 is not properly starting/stopping in regards to the configured CARP status IP.
Files
Updated by Christian McDonald almost 4 years ago
Update: I'm not seeing this in the latest snapshots now. So I'm not entirely sure what's going on. There might be an edge case when upgrading from 2.4.x branch and porting the configuration over, not sure. But this might be closeable unless someone else can replicate this.
Updated by Alhusein Zawi almost 4 years ago
please provide the Steps to reproduce the issue.
Updated by Christian McDonald almost 4 years ago
I experienced this with a very simple OSPF configuration that I had on the 2.4 stable branch. This was an in-place upgrade to the 2.5 development branch (eager to begin testing WireGuard). Initially I was seeing FRR 1) not starting on system boot-up and 2) not following CARP VIP status. So, I uninstalled FRR 1.0.0, neutered my config.xml file by hand (removing any and all references to FRR configuration), reinstalled FRR 1.0.0 and reconfigured the packaged from scratch. I thought I was still observing the issue, but it seems to be working correctly now...so I'm not entirely sure what happened.
Either way, it would be worth at least keeping an eye out for this potential edge case.
Updated by Jim Pingle almost 4 years ago
- Status changed from New to Closed
- Target version deleted (
2.5.0)
I'll close this out for now, but if someone can reproduce it, we can open it back up with more details about exactly what it takes to replicate.
Updated by Jeremy Utley over 3 years ago
I am encountering this exact issue on 2.5.1 now. I have a pair of 2.5.1 PFSense CE installs with IPSec connections to the AWS Site-To-Site VPN service. FRR is installed and speaks BGP to AWS over the IPSec connections. The background is the machines were initially installed with 2.4.5p1, upgraded to 2.5, then upgraded to 2.5.1. FRR was installed after the upgrade to 2.5.1, BGP peering was established on both machines with AWS successfully, and then the "CARP Status IP" option was set to the LAN CARP IP. With this setting, putting the primary into maintenance mode moves the CARP IP, but FRR does not start up on the standby server - it remains running on the primary. Also, upon reboot of the primary, FRR does not automatically start when the CARP IP goes to MASTER state like it should.
Not sure where to look for additional information to troubleshoot - the system logs indicate nothing whatsoever that I can see. But if someone can give me some tips on what to look for to help diagnose this issue, I'll be happy to help as much as I can.
Updated by Viktor Gurov over 3 years ago
Jeremy Utley wrote:
I am encountering this exact issue on 2.5.1 now. I have a pair of 2.5.1 PFSense CE installs with IPSec connections to the AWS Site-To-Site VPN service. FRR is installed and speaks BGP to AWS over the IPSec connections. The background is the machines were initially installed with 2.4.5p1, upgraded to 2.5, then upgraded to 2.5.1. FRR was installed after the upgrade to 2.5.1, BGP peering was established on both machines with AWS successfully, and then the "CARP Status IP" option was set to the LAN CARP IP. With this setting, putting the primary into maintenance mode moves the CARP IP, but FRR does not start up on the standby server - it remains running on the primary. Also, upon reboot of the primary, FRR does not automatically start when the CARP IP goes to MASTER state like it should.
Unable to reproduce - all FRR services started/stopped as expected when I put the master node to 'CARP Maintenance Mode' or 'Temporary Disable CARP'.
Please provide more information about your FRR configuration - are you using BGP TCP MD5 authentication? AgentX(SNMP)/RPKI options? Something special?
Updated by Jeremy Utley over 3 years ago
Viktor Gurov wrote:
Jeremy Utley wrote:
I am encountering this exact issue on 2.5.1 now. I have a pair of 2.5.1 PFSense CE installs with IPSec connections to the AWS Site-To-Site VPN service. FRR is installed and speaks BGP to AWS over the IPSec connections. The background is the machines were initially installed with 2.4.5p1, upgraded to 2.5, then upgraded to 2.5.1. FRR was installed after the upgrade to 2.5.1, BGP peering was established on both machines with AWS successfully, and then the "CARP Status IP" option was set to the LAN CARP IP. With this setting, putting the primary into maintenance mode moves the CARP IP, but FRR does not start up on the standby server - it remains running on the primary. Also, upon reboot of the primary, FRR does not automatically start when the CARP IP goes to MASTER state like it should.
Unable to reproduce - all FRR services started/stopped as expected when I put the master node to 'CARP Maintenance Mode' or 'Temporary Disable CARP'.
Please provide more information about your FRR configuration - are you using BGP TCP MD5 authentication? AgentX(SNMP)/RPKI options? Something special?
Nothing special at all. Pretty much about as basic as can be while still providing BGP peering - private AS numbers, publishing a single /24 subnet to AWS, and accepting a single /16 subnet from AWS. Over the last few days, both our firewalls were rebooted. CARP IP is back up on primary, but FRR has never started.
It's obvious something is going on, as there are other reports of similar problems in the Forums (See: https://forum.netgate.com/topic/162722/frr-doesn-t-follow-carp-after-2-5-0-upgrade ). And it doesn't seem to always happen, either - but for some it does. What log files should I (or someone else) be looking at to try to diagnose this problem? Status -> System Logs -> Routing for my machine is completely empty since Apr 19, while the machine itself rebooted on April 30 - which means FRR should have started on that day (when the CARP IP went to Master state after that reboot).
Updated by Viktor Gurov over 3 years ago
- Status changed from Closed to New
For some reason my primary node doesn't have a `plugin_carp` config.xml entry for FRR,
secondary is OK:
... <tab> <text><![CDATA[Status]]></text> <url>/status_frr.php</url> </tab> </tabs> <include_file>/usr/local/pkg/frr.inc</include_file> <plugins> <item> <type>plugin_carp</type> </item> </plugins> </package>
same customer config issue
Updated by Jeremy Utley over 3 years ago
I had been wondering if this problem only popped up on systems that were upgraded from 2.4.x to 2.5.x, and maybe it would work properly on a fresh installation of 2.5.1. So today, I set up a test on some hypervisors we have in house. I did fresh installations of 2.5.1 to 3 VMs (FW1, FW2, FW3), with FRR installed to all 3. FW1 & FW2 were set up as a HA pair, BGP ASN 64512, and providing a single CIDR via BGP. FW3 was set up stand-alone, BGP ASN 64513, also providing a single CIDR via BGP. On both FW1 and FW2, the CARP Status IP option was set to the CARP IP on the lan interface.
When I set CARP maintenance mode on FW01, I would expect the CARP IP to move to FW02, and the FRR daemons to be started there. While the CARP IP moves, the FRR daemons never start, and never stop on FW01. So that proves conclusively that the problem is not related to fresh install vs upgrade, so I'm clueless as to why it works for some and not for others at this point.
I have left this test setup in place, but shut down. I'm happy to help diagnose this problem, just need an expert to tell me what log files or config files to look at. I'll even provide copies of files from my test bed if it will help!
Updated by Marcos M over 3 years ago
Do those have the <type>plugin_carp</type>
line in the /conf/config.xml file? If not, does adding it change the results?
Updated by Jeremy Utley over 3 years ago
- File frr-config.xml frr-config.xml added
- File Screen Shot 2021-05-27 at 10.13.56 AM.png Screen Shot 2021-05-27 at 10.13.56 AM.png added
Marcos Mendoza wrote:
Do those have the
<type>plugin_carp</type>
line in the /conf/config.xml file? If not, does adding it change the results?
Nope - checking the /conf/config.xml file, it does not have any reference to "plugin_carp":
[2.5.1-RELEASE][admin@test-fw1.home.arpa]/conf: cat config.xml | grep plugin
[2.5.1-RELEASE][admin@test-fw1.home.arpa]/conf:
I'm unsure exactly where to add it in that file either, as what I see in that file does not match what Viktor pasted in comment #8. I did pull a copy of the entire <installedpackages>...</installedpackages> section of my config.xml file, and have attached it to this report (there's nothing real confidential in there, considering this is from a test bed on an internal network). FRR is the only add-on package installed on this test setup.
Also attaching a screenshot of the WebUI showing that the CARP Status IP is indeed selected.
Updated by Jim Pingle over 3 years ago
- Project changed from pfSense Packages to pfSense
- Category changed from FRR to Package System
- Status changed from New to In Progress
- Target version set to 2.6.0
- Plus Target Version set to 21.09
- Release Notes set to Default
This is actually a problem in the base system not specific to a package. I have a fix, will commit shortly.
Updated by Jim Pingle over 3 years ago
- Subject changed from FRR Service not following CARP Status IP to Package plugin tags are not being populated in certain cases
Updated by Jeremy Utley over 3 years ago
Jim Pingle wrote:
This is actually a problem in the base system not specific to a package. I have a fix, will commit shortly.
Glad to hear it Jim! I have my test bed setup ready to go, plus my original cluster is not in full production yet because of this issue, so I can easily test if you let me know the commit id to put into the Patch manager.
Updated by Jim Pingle over 3 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
Applied in changeset 7dbe76cd5756082cbd67db1b93acb606ad84996e.
Updated by Jeremy Utley over 3 years ago
Jim Pingle wrote:
Applied in changeset 7dbe76cd5756082cbd67db1b93acb606ad84996e.
Can confirm this fixes the problem, at least for my setup! Many thanks!
Updated by Jim Pingle over 3 years ago
- Target version changed from 2.6.0 to 2.5.2
On RELENG_2_5_2 when branched
Updated by Jim Pingle over 3 years ago
- Status changed from Feedback to New
- Target version changed from 2.5.2 to 2.6.0
There is still a bug here somewhere. Installing FRR on a complete fresh installation still doesn't get the proper <plugin>
tag contents. Reinstalling the package after it's been installed once works, however.
Since there is a viable easy workaround, moving this ahead so we can work on a more complete fix without rushing.
Also, I noticed the <tabs>
tags were not populated on the first install, but were populated on reinstall. Seems like a similar issue. Both cases have nested tags (plugin->item->type, tabs->tab->text/url)
Updated by Jim Pingle over 3 years ago
For whatever reason, PHP was failing to copy certain values into $pkg_data
which was a reference to the pkg configuration data in $config
. When I switch the code to use the full $config
path directly, it works. It affected the plugins, tabs, and also include_file.
With the fix in place, all values are present now on the first installation from a clean config.
Leaving this at a 2.6.0/21.09 target since it needs some extra time for testing to ensure it doesn't have a negative impact.
Updated by Jim Pingle over 3 years ago
- Status changed from New to Feedback
Applied in changeset 99b3a5cb0ef4586222a331045df3cee17bb25d31.
Updated by Jim Pingle about 3 years ago
- Subject changed from Package plugin tags are not being populated in certain cases to Package ``<plugins>`` and ``<tabs>`` content missing from configuration in some cases
Updated by Jim Pingle about 3 years ago
- Plus Target Version changed from 21.09 to 22.01
Updated by Jim Pingle almost 3 years ago
- Status changed from Feedback to Resolved
The expected tags are present after a fresh package install now, as expected.