Project

General

Profile

Actions

Regression #14139

closed

CARP announcement src MAC should be virtual MAC

Added by Robert Karsai almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Category:
CARP
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
23.05
Release Notes:
Force Exclusion
Affected Version:
2.7.0
Affected Architecture:
All

Description

Hi All,

I think at some point in the last couple of 2.7.0 builds CARP function became somewhat broken. CARP announcements should be sent out with CARP virtual src MAC (CARP_good.png) but on 2.7.0.a.20230321.0600
I see that the src MAC is the physical interface MAC (CARP_broken.png) which can cause various interesting problems.

BR
--
Robert Karsai


Files

CARP_good.png (40.4 KB) CARP_good.png Robert Karsai, 03/21/2023 12:44 PM
CARP_broken.png (41.7 KB) CARP_broken.png Robert Karsai, 03/21/2023 12:44 PM
clipboard-202304191320-mxq9y.png (21.4 KB) clipboard-202304191320-mxq9y.png source MAC 00:00:5e:00:01:03 Danilo Zrenjanin, 04/19/2023 06:20 AM
Actions #1

Updated by Jim Pingle over 1 year ago

  • Tracker changed from Bug to Regression
  • Assignee set to Kristof Provost
  • Target version set to 2.7.0
  • Plus Target Version set to 23.05
  • Release Notes changed from Default to Force Exclusion

On 2.7.0.a.20230314.0600 the CARP advertisement source MAC was still the CARP MAC, but on current snaps it is the interface MAC. Looking at the FreeBSD source, Kristof merged a few CARP changes on the 20th to add support for unicast CARP.

I'm not sure if this change was intentional/unintentional or if there is a new sysctl or option to change it and we need to adjust the defaults, etc. Assigning to Kristof since he touched it last.

At least on my test cluster the nodes still track master/backup status properly after upgrading to 2.7.0.a.20230327.0600 and hosts contacting the CARP VIP see the CARP MAC in the ARP table so I'm not sure what "various interesting problems" are being referenced in the description here.

Actions #2

Updated by Kristof Provost over 1 year ago

Hmm, yeah, that could be fallout from the unicast carp work. In unicast mode we use the interface Mac as source (mostly so AWS doesn't drop the packet). That shouldn't be the case in multicast mode though, so I'm not quite sure what's going on here.
The check in carp_output() looks correct. I'll debug when I'm back.

Actions #3

Updated by Robert Karsai over 1 year ago

Hi Jim, yes, master & backup states are OK, even the switchover is OK, however without the right announcements coming from the master with the CARP virtual src MAC, switches can't learn to which port should they forward the traffic so both units get it. Also there are mgmt webgui timeouts since this change (always on the backup unit) and of course the kernel messages also mentioned in https://redmine.pfsense.org/issues/14163

Mar 27 15:28:09 kernel [nl_generic] PID 15866 genl_handle_message: received family carp cmd SIOCGVH len 28
Mar 27 15:28:09 kernel [nl_generic] PID 15866 genl_handle_message: received family nlctrl cmd GETFAMILY len 32
Mar 27 15:28:09 kernel [nl_generic] PID 15380 genl_handle_message: received family carp cmd SIOCGVH len 28
Mar 27 15:28:09 kernel [nl_generic] PID 15380 genl_handle_message: received family nlctrl cmd GETFAMILY len 32

BR, Robert

Actions #4

Updated by Kristof Provost over 1 year ago

Switches do not learn what port to use based on the carp announcements, so that's not actually something to worry about.
It's not quite correct according to the CARP protocol (insofar as that protocol is specified), so we will fix it.

The netlink messages are there because carp configuration now uses netlink. The log messages are overly verbose, but harmless and can safely be ignored.

Actions #5

Updated by Chris Linstruth over 1 year ago

Actually, they do.

Actions #6

Updated by Kristof Provost over 1 year ago

The bug is fairly obvious now. The check for multicast in carp_output() expects the IP address to be in host endianness, and it's presented in network endianness. The fix is trivial and has gone upstream in https://cgit.freebsd.org/src/commit/?id=ccff2078af42dc066e1c38d25fcb83d960c3c22b
We'll pick that up in future merges, probably in a week or two.

Actions #7

Updated by Jim Pingle over 1 year ago

  • Status changed from New to Waiting on Merge
Actions #8

Updated by Kristof Provost over 1 year ago

  • Status changed from Waiting on Merge to Ready To Test

The fix has been merged and will be present in future snapshot builds.

Actions #9

Updated by Robert Karsai over 1 year ago

Looking good on 2.7.0.a.20230331.1347, these are virtual src MACs coming from the MASTER:

tcpdump -e -i

00:13:34.850333 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36
00:13:35.851443 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36
00:13:36.874236 00:00:5e:00:01:16 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 70: 10.10.10.2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 22, prio 0, authtype none, intvl 1s, length 36

Actions #10

Updated by Danilo Zrenjanin over 1 year ago

Tested against:

2.7.0.a.20230419.0600

CARP announcements use the CARP VIP MAC address. In my case, it was 00:00:5e:00:01:03.

source MAC 00:00:5e:00:01:03

I am marking this ticket resolved.

Actions

Also available in: Atom PDF