Project

General

Profile

Bug #1432

Carp Vips are promoted to master before firewall filter load

Added by Michele Di Maria about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
CARP
Target version:
Start date:
04/11/2011
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.0
Affected Architecture:
i386

Description

When the "master" machine boots, the CARP ips are promoted to master immediately, even before the firewall filters are loaded/processed. With complex configuration (lot's of carp ips, lot's of firewall rules/nat, etc.) this causes some seconds of total inactivity on the network (in my case about 10-15), making the states sync almost useless.
Maybe there's the way to let the master machine wait for CARP promote until the firewall is ready to manage the states running on the backup machine.

carp_master_boot.png (19 KB) carp_master_boot.png Michele Di Maria, 04/19/2011 04:13 AM
reboot.png (19.2 KB) reboot.png Michele Di Maria, 04/23/2011 04:09 AM

Associated revisions

Revision 359f6307 (diff)
Added by Ermal Luçi about 8 years ago

Block instead of allowing proto carp/pfsync during bootup since this may cause issues. Ticket #1432

Revision bce14123 (diff)
Added by Ermal Luçi about 8 years ago

Actually call interfaces_carp_setup after the carp interfaces are created so carp traffic can only flow after we have all vips up and running. This prevents premption more early than necessary. Ticket #1432.

History

#1 Updated by Chris Buechler about 8 years ago

  • Project changed from pfSense Packages to pfSense

#2 Updated by Chris Buechler about 8 years ago

  • Category set to CARP
  • Target version set to 2.0

#3 Updated by Ermal Luçi about 8 years ago

  • Status changed from New to Feedback

#4 Updated by Michele Di Maria about 8 years ago

The situation has improved, but not resolved yet... (tested with build 18 23:29:41 EDT 2011 i386)

In the image attached you can see the "red line" (at 10:02:07) when the primary machine booted, then total silence for 30 seconds, then the primary machine was starting to manage the traffic...

The log says as follow:
Apr 19 10:02:01 pfsense1 syslogd: kernel boot file is /boot/kernel/kernel
...
...
Apr 19 10:02:05 pfsense1 kernel: vip254: MASTER -> BACKUP (more frequent advertisement received)
Apr 19 10:02:05 pfsense1 kernel: vip254: link state changed to DOWN
...
(in my configuration there are about 80 carp vips)
...
Apr 19 10:02:06 pfsense1 kernel: vip254: BACKUP -> MASTER (preempting a slower master)
Apr 19 10:02:06 pfsense1 kernel: vip254: link state changed to UP
...
(again for about 80 carp vips)
...
Apr 19 10:02:11 pfsense1 check_reload_status: syncing firewall
Apr 19 10:02:14 pfsense1 check_reload_status: syncing firewall
Apr 19 10:02:16 pfsense1 dnsmasq52608: started, version 2.55 cachesize 10000
Apr 19 10:02:16 pfsense1 dnsmasq52608: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
Apr 19 10:02:16 pfsense1 dnsmasq52608: reading /etc/resolv.conf
...
etc.

at this point the router started to manage the traffic... (about 10:02:16-10:02:20)...
If you need the full log file I can send via email...

Thanks,
Michele

#5 Updated by Michele Di Maria about 8 years ago

I caught it, the master firewall started to work after this log line:

Apr 19 10:03:27 pfsense1 check_reload_status: reloading filter

it is consistent with the previous screenshot...
The carp vips should change the state to up only after this event, not before, otherwise the vips are up but the traffic can't pass throught pfSense...

Thanks,
Michele

#6 Updated by Ermal Luçi about 8 years ago

I pushed another change so try with that.
Though i think carp needs to be teached about a 'start' sysctl as it has allow for incoming packets.
Or the allow sysctl meaning should be changed to control both sending and receiving.
The later makes more sense.

#7 Updated by Ermal Luçi about 8 years ago

  • % Done changed from 0 to 100

Applied in changeset commit:"9411fbf73e52f01730da3fc8ba663bc901087144".

#8 Updated by Michele Di Maria about 8 years ago

Tested and working! (see image) the red lines at 10:03:50 and 10:05:05 represents the moments where the secondary machine was promoting to master and demoting to slave.

At the first tests it didn't work yet (see on the image at 9:59:55) and it was strange because this time the delay was applied only to the traffic of the DMZ, and I was wondering why and trying to give an explaination... the reason was because of the DMZ switch setup. I configured each port of the DMZ switch with "spanning-tree portfast" then it was perfect...

Thanks a lot!
Michele

#9 Updated by Chris Buechler about 8 years ago

  • Status changed from Feedback to Resolved

thanks

Also available in: Atom PDF