Project

General

Profile

Actions

Bug #10955

open

XMLRPC sync errors when failover peer IP is specified in DHCP server settings

Added by Max Leighton 12 months ago. Updated 2 days ago.

Status:
Pull Request Review
Priority:
Normal
Category:
XMLRPC
Target version:
Start date:
10/04/2020
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Plus-Next
Release Notes:
Default
Affected Version:
2.5.0
Affected Architecture:

Description

Forum post: https://forum.netgate.com/topic/156974/xmlrpc-sync-error-built-on-sun-sep-20-01-01-05-edt-2020

I'm seeing this behavior in:

2.5.0-DEVELOPMENT (amd64)
built on Sun Oct 04 18:53:52 EDT 2020
FreeBSD 12.2-STABLE

DHCPv6 Relay and DHCPv6 Server & RA are disabled on all interfaces and no IPv6 configuration exists whatsoever. Any configuration change made on the primary node triggers two errors and does not sync to the secondary.

Exception calling XMLRPC method restore_config_section # Impossible to encode value '' from type 'NULL'. No analogous type in XML_RPC. 2020-10-04 22:02:55
Exception calling XMLRPC method restore_config_section # Impossible to encode value '' from type 'NULL'. No analogous type in XML_RPC.
2020-10-04 22:02:56

Removing the Failover Peer IP from the DHCP server settings or unchecking DHCP server settings from the synced options in System>High Avail. Sync will resolve the issue and allow configuration changes to sync once again.

Actions #1

Updated by Jim Pingle 12 months ago

  • Status changed from New to Confirmed
  • Priority changed from Normal to Very High
  • Target version set to 2.5.0

I'm seeing this as well

Actions #3

Updated by Renato Botelho 12 months ago

  • Status changed from Confirmed to Feedback
  • Assignee set to Renato Botelho
  • % Done changed from 0 to 100

PR has been merged. Thanks!

Actions #4

Updated by Raul Ramos 12 months ago

I started the post i put my feedback here.

built on Fri Oct 09 14:15:42 EDT 2020 is working as expected.

Thanks

Actions #5

Updated by Viktor Gurov 12 months ago

  • Status changed from Feedback to Resolved

works fine on 2.5.0.a.20201009.1850 HA

Actions #6

Updated by Jim Pingle 11 months ago

  • Category changed from CARP to XMLRPC
Actions #7

Updated by Azamat Khakimyanov 6 months ago

  • Status changed from Resolved to Pull Request Review
  • Priority changed from Very High to Normal
  • Target version changed from 2.5.0 to Plus-Next

According to https://github.com/pfsense/pfsense/pull/4479/commits/64431f257bb831a8aa121c356bbef3ab28d0ddc1 function route_get was changed
$result = array();
foreach ($rtable[$family] as $item) {
- if ($item['destination'] == $target) {
+ if (ip_in_subnet($target, $item['destination'])) {
$result[] = $item;
but now when I opened /etc/inc/util.inc I see
$result = array();
foreach ($rtable[$family] as $item) {
if ($item['destination'] == $target ||
ip_in_subnet($target, $item['destination'])) {
$result[] = $item;
and there is a ticket https://go.netgate.com/helpdesk/tickets/81347 - on 21.02_p1 client has exactly the same issue which was described in Bug.
Looks like route_get function was rewritten again and it initiated Bug with HA cluster again.

Steve wrote that function route_get were changed https://github.com/pfsense/pfsense/commit/b1558574e69965ea68744ad355a60842ca8294ea#diff-e8a8d358a47a0aaf09fb3b160a32f5dd7625b3c1c82210d7f2412e182d0fcd66

Current Bug 10955 was marked resolved at Oct 10 2020 but commit I mentioned above was added at Nov 2 2020 - after this Bug was tested and resolved.

Actions #8

Updated by Manuel Trier 5 months ago

Same for me, bug is present again

Actions #9

Updated by Jim Pingle 5 months ago

  • Status changed from Pull Request Review to New
Actions #10

Updated by khaled osama 2 months ago

the bug still exist after upgrading to 2.5.2

Exception calling XMLRPC method restore_config_section # Impossible to encode value '' from type 'NULL'. No analogous type in XML_RPC. @ 2021-07-19 16:23:54
Actions #11

Updated by Marcos Mendoza about 1 month ago

It seems this can be triggered if entering "None" for gateway.

Actions #12

Updated by Jim Pingle about 1 month ago

Marcos Mendoza wrote in #note-11:

It seems this can be triggered if entering "None" for gateway.

Where/On what page?

Actions #13

Updated by Marcos Mendoza about 1 month ago

Jim Pingle wrote in #note-12:

Where/On what page?

Services / DHCP Server / <Interface> // Other Options / Gateway

I will try confirming this ("None" value) as it's unclear if that's the trigger or not.

Edit: I was able to test this on a lab and this was not an issue.

Actions #14

Updated by Joachim Tingvold 13 days ago

I can confirm that I see this on a freshly installed 2.5.2 HA setup.

I have not yet found a way to actually be able to push the DHCP server configuration to the backup node.

  • Specifying "Failover Peer" does not help (i.e. it still fails to sync with the error mentioned previously in this case)
  • Removing "Failover Peer" does not help (nothing is synced, and it still complains, as long as "DHCP Server" is enabled in the XMLRPC sync settings)
  • Removing "DHCP Server" in the XMLRPC sync settings makes the error message stop, but nothing is synced

Have anyone found a workaround for this? (manual or otherwise).

Actions #15

Updated by Marcos Mendoza 13 days ago

I was able to reliably reproduce this. I believe the issue is within find_interface_ip(). If the interface does not exist or is a bridge, the function returns without a value leading to the error log. Some error handling would likely be beneficial here. My guess is that the root issue lies somewhere in pfSense_interface_listget(), but it's not clear where that's defined.

Actions #16

Updated by Jim Pingle 12 days ago

  • Target version changed from Plus-Next to CE-Next
  • Plus Target Version set to Plus-Next

Bridges wouldn't be valid with HA, so that isn't a supported configuration. If the interfaces mismatch, that also wouldn't be supported in HA.

Do we have an example of a proper HA case where this should work and doesn't?

We will need to see a sanitized config.xml from both nodes for a pair which can reproduce this error since we can't seem to definitively identify the problem cases.

Actions #17

Updated by Joachim Tingvold 12 days ago

I expect my two 2.5.2 HA nodes to come online within a day or two, and I'll provide sanitized config.xml from them both. I currently face this issue on them.

Actions #18

Updated by Marcos Mendoza 12 days ago

I've submitted the following to fix the reported issue:
https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/389

I have a config.xml from an HA setup which was not using bridges and was experiencing the reported issue. However, I no longer have access to the systems and I cannot reproduce the issue using the config.xml as reference.

During my testing, I was able to manually desync the interface array cache; since the call to find_interface_ip() does not use $flush = true, I suspect there are edge cases where this happens and it leads to the reported error.

Actions #19

Updated by Jim Pingle 12 days ago

  • Status changed from New to Pull Request Review
Actions #20

Updated by Joachim Tingvold 11 days ago

So, while going through the configuration to sanitize them, I noticed the following;

  • node1 and node2 had VLAN interface ("lan / lagg0.1") configured without any IPv4 address set ("IPv4 Configuration Type" set to "None").
  • node1 had DHCPv4 server configuration for interface "lan" (lagg0.1), which should only be possible when importing configration (which was the case). This was only visible in the exported configuration, and not in the GUI itself (since it disables DHCPv4 settings for interfaces without an IPv4 address, or similar).

I configured an IPv4 address for interface "lan / lagg0.1" on both nodes, then disabled + re-enabled DHCPv4 server for that interface. Enabling "DHCP Server settings" XMLRPC sync now completes without any issues.

I suspect this information should shed some light on a specific corner case where the issue seems to present itself. Probably not directly relevant to the original description in this bug, but a bug nevertheless. In my scenario it should probably just skip to sync the DHCPv4 settings for an interface without an IP address? (or similar).

It should be fairly easy to reproduce;

  • Configure IPv4 + DHCPv4 on interface on both node1 and node2
  • Export configuration
  • Set IPv4 on interface to "None" in the exported config (but leave DHCPv4 configuration)
  • Import configuration (should have interface without IPv4 address, but DHCPv4 config for said interface)
  • Enable "DHCP Server settings" in XMLRPC, and watch it fail?
Actions #21

Updated by Marcos Mendoza 11 days ago

Thank you for the info. With the proposed fix, this scenario should not be an issue.

Actions #22

Updated by Manuel Trier 2 days ago

Joachim Tingvold wrote in #note-20:

So, while going through the configuration to sanitize them, I noticed the following;

  • node1 and node2 had VLAN interface ("lan / lagg0.1") configured without any IPv4 address set ("IPv4 Configuration Type" set to "None").
  • node1 had DHCPv4 server configuration for interface "lan" (lagg0.1), which should only be possible when importing configration (which was the case). This was only visible in the exported configuration, and not in the GUI itself (since it disables DHCPv4 settings for interfaces without an IPv4 address, or similar).

I configured an IPv4 address for interface "lan / lagg0.1" on both nodes, then disabled + re-enabled DHCPv4 server for that interface. Enabling "DHCP Server settings" XMLRPC sync now completes without any issues.

I suspect this information should shed some light on a specific corner case where the issue seems to present itself. Probably not directly relevant to the original description in this bug, but a bug nevertheless. In my scenario it should probably just skip to sync the DHCPv4 settings for an interface without an IP address? (or similar).

It should be fairly easy to reproduce;

  • Configure IPv4 + DHCPv4 on interface on both node1 and node2
  • Export configuration
  • Set IPv4 on interface to "None" in the exported config (but leave DHCPv4 configuration)
  • Import configuration (should have interface without IPv4 address, but DHCPv4 config for said interface)
  • Enable "DHCP Server settings" in XMLRPC, and watch it fail?

This was the solution in my case! Thank you very much! Manual editing of the XML Backup file and removing the unused DHCP config solved this issue!

Actions

Also available in: Atom PDF