CARP Sync Issue - when no internet on standby
I've noted a possible bug in pfSense CARP. We have multiple pf instances set up in failover. In some of them, we do not have public IPs available for both primary and the secondary and therefore we are using internal IPs for the WAN interfaces, and a public IP for the CARP.
This works well and fails over normally. However, when making changes on the primary, they often fail with:
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: XML_RPC_Client: RPC server did not send response before timeout. 103
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: A communications error occurred while attempting XMLRPC sync with username admin https://172.16.18.2:443.
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: New alert found: A communications error occurred while attempting XMLRPC sync with username admin https://172.16.18.2:443.
If I reboot the secondary (standby) pfSense, it syncs up all the changes on first boot and then starts throwing that error up around 10 minutes afterwards.
At first, I blamed the internal IPs on the WAN interfaces. Therefore, I replaced the private IPs on one instance and put public IPs throughout. This immediately resolved the issue. So I continued to explore logs to try and identify the actual root cause. As part of this test, I blocked internet access to the secondary (standby) pfSense unit. As soon as I did this, the unit started throwing the above errors.
When using private IPs, the secondary (standby) unit never has internet access until failover occurs. Therefore, this issue seems to be related to the standby unit not having internet and/or not reaching the gateway.
#1 Updated by Jim Pingle about 3 years ago
- Assignee deleted (
- Priority changed from High to Very Low
- Target version changed from 2.3.4-p2 to Future
- Affected Version changed from 2.3.4 to All
That scenario is rare and not one we technically support or encourage. It's OK for secondary WANs or LANs but both firewalls need external connectivity. This is especially true when you have features enabled which require it. For example, anything that requires DNS resolution such as hostnames in aliases, dynamic DNS, hostnames in VPN peers, aliases which fetch their contents from URLs, packages like pfBlocker which need to download lists, etc.
There may be a way to optimize that in the future but it is not a high priority.
#2 Updated by Yann Tintignac about 3 years ago
I had the same issue when using a PfSense cluster with CARP with a /32 Public IP Allocation. I think lot of customers can be impacted by this limitation because not everbody can have a /29 or larger allocation from their ISP.
I believe some specific features such as : WANGW with IP outside the Wan Subnet and CARP / Virtual IP outside the Wan Subnet were implemented to solve the /30->/32 Public allocation on WAN side.
So in this context, I agree with Brian, we should sync the PFSense states / config even when Slave device can not access Internet. This is clearly a situation where the expected setup is Active / Standby. The StandBy node can access Internet only when he becomes Active following a failover.
If this makes sense for you, can you please reconsider the priority of this FR please ?
By advance, thank you.