Regression #14139
closedCARP announcement src MAC should be virtual MAC
Added by Robert Karsai over 1 year ago. Updated over 1 year ago.
0%
Description
Hi All,
I think at some point in the last couple of 2.7.0 builds CARP function became somewhat broken. CARP announcements should be sent out with CARP virtual src MAC (CARP_good.png) but on 2.7.0.a.20230321.0600
I see that the src MAC is the physical interface MAC (CARP_broken.png) which can cause various interesting problems.
BR
--
Robert Karsai
Files
CARP_good.png (40.4 KB) CARP_good.png | Robert Karsai, 03/21/2023 12:44 PM | ||
CARP_broken.png (41.7 KB) CARP_broken.png | Robert Karsai, 03/21/2023 12:44 PM | ||
clipboard-202304191320-mxq9y.png (21.4 KB) clipboard-202304191320-mxq9y.png | source MAC 00:00:5e:00:01:03 | Danilo Zrenjanin, 04/19/2023 06:20 AM |
Updated by Jim Pingle over 1 year ago
- Tracker changed from Bug to Regression
- Assignee set to Kristof Provost
- Target version set to 2.7.0
- Plus Target Version set to 23.05
- Release Notes changed from Default to Force Exclusion
On 2.7.0.a.20230314.0600
the CARP advertisement source MAC was still the CARP MAC, but on current snaps it is the interface MAC. Looking at the FreeBSD source, Kristof merged a few CARP changes on the 20th to add support for unicast CARP.
I'm not sure if this change was intentional/unintentional or if there is a new sysctl or option to change it and we need to adjust the defaults, etc. Assigning to Kristof since he touched it last.
At least on my test cluster the nodes still track master/backup status properly after upgrading to 2.7.0.a.20230327.0600
and hosts contacting the CARP VIP see the CARP MAC in the ARP table so I'm not sure what "various interesting problems" are being referenced in the description here.
Updated by Kristof Provost over 1 year ago
Hmm, yeah, that could be fallout from the unicast carp work. In unicast mode we use the interface Mac as source (mostly so AWS doesn't drop the packet). That shouldn't be the case in multicast mode though, so I'm not quite sure what's going on here.
The check in carp_output() looks correct. I'll debug when I'm back.
Updated by Robert Karsai over 1 year ago
Hi Jim, yes, master & backup states are OK, even the switchover is OK, however without the right announcements coming from the master with the CARP virtual src MAC, switches can't learn to which port should they forward the traffic so both units get it. Also there are mgmt webgui timeouts since this change (always on the backup unit) and of course the kernel messages also mentioned in https://redmine.pfsense.org/issues/14163
Mar 27 15:28:09 kernel [nl_generic] PID 15866 genl_handle_message: received family carp cmd SIOCGVH len 28
Mar 27 15:28:09 kernel [nl_generic] PID 15866 genl_handle_message: received family nlctrl cmd GETFAMILY len 32
Mar 27 15:28:09 kernel [nl_generic] PID 15380 genl_handle_message: received family carp cmd SIOCGVH len 28
Mar 27 15:28:09 kernel [nl_generic] PID 15380 genl_handle_message: received family nlctrl cmd GETFAMILY len 32
BR, Robert
Updated by Kristof Provost over 1 year ago
Switches do not learn what port to use based on the carp announcements, so that's not actually something to worry about.
It's not quite correct according to the CARP protocol (insofar as that protocol is specified), so we will fix it.
The netlink messages are there because carp configuration now uses netlink. The log messages are overly verbose, but harmless and can safely be ignored.
Updated by Kristof Provost over 1 year ago
The bug is fairly obvious now. The check for multicast in carp_output() expects the IP address to be in host endianness, and it's presented in network endianness. The fix is trivial and has gone upstream in https://cgit.freebsd.org/src/commit/?id=ccff2078af42dc066e1c38d25fcb83d960c3c22b
We'll pick that up in future merges, probably in a week or two.
Updated by Jim Pingle over 1 year ago
- Status changed from New to Waiting on Merge
Updated by Kristof Provost over 1 year ago
- Status changed from Waiting on Merge to Ready To Test
The fix has been merged and will be present in future snapshot builds.
Updated by Robert Karsai over 1 year ago
Looking good on 2.7.0.a.20230331.1347, these are virtual src MACs coming from the MASTER:
tcpdump -e -i
00:13:34.850333 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36
00:13:35.851443 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36
00:13:36.874236 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36
Updated by Danilo Zrenjanin over 1 year ago
- File clipboard-202304191320-mxq9y.png clipboard-202304191320-mxq9y.png added
- Status changed from Ready To Test to Resolved
Tested against:
2.7.0.a.20230419.0600
CARP announcements use the CARP VIP MAC address. In my case, it was 00:00:5e:00:01:03.
I am marking this ticket resolved.