Project

General

Profile

Actions

Bug #1432

closed

Carp Vips are promoted to master before firewall filter load

Added by Michele Di Maria about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
CARP
Target version:
Start date:
04/11/2011
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.0
Affected Architecture:
i386

Description

When the "master" machine boots, the CARP ips are promoted to master immediately, even before the firewall filters are loaded/processed. With complex configuration (lot's of carp ips, lot's of firewall rules/nat, etc.) this causes some seconds of total inactivity on the network (in my case about 10-15), making the states sync almost useless.
Maybe there's the way to let the master machine wait for CARP promote until the firewall is ready to manage the states running on the backup machine.


Files

carp_master_boot.png (19 KB) carp_master_boot.png Michele Di Maria, 04/19/2011 04:13 AM
reboot.png (19.2 KB) reboot.png Michele Di Maria, 04/23/2011 04:09 AM
Actions #1

Updated by Chris Buechler about 13 years ago

  • Project changed from pfSense Packages to pfSense
Actions #2

Updated by Chris Buechler about 13 years ago

  • Category set to CARP
  • Target version set to 2.0
Actions #3

Updated by Ermal Luçi about 13 years ago

  • Status changed from New to Feedback
Actions #4

Updated by Michele Di Maria about 13 years ago

The situation has improved, but not resolved yet... (tested with build 18 23:29:41 EDT 2011 i386)

In the image attached you can see the "red line" (at 10:02:07) when the primary machine booted, then total silence for 30 seconds, then the primary machine was starting to manage the traffic...

The log says as follow:
Apr 19 10:02:01 pfsense1 syslogd: kernel boot file is /boot/kernel/kernel
...
...
Apr 19 10:02:05 pfsense1 kernel: vip254: MASTER -> BACKUP (more frequent advertisement received)
Apr 19 10:02:05 pfsense1 kernel: vip254: link state changed to DOWN
...
(in my configuration there are about 80 carp vips)
...
Apr 19 10:02:06 pfsense1 kernel: vip254: BACKUP -> MASTER (preempting a slower master)
Apr 19 10:02:06 pfsense1 kernel: vip254: link state changed to UP
...
(again for about 80 carp vips)
...
Apr 19 10:02:11 pfsense1 check_reload_status: syncing firewall
Apr 19 10:02:14 pfsense1 check_reload_status: syncing firewall
Apr 19 10:02:16 pfsense1 dnsmasq52608: started, version 2.55 cachesize 10000
Apr 19 10:02:16 pfsense1 dnsmasq52608: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
Apr 19 10:02:16 pfsense1 dnsmasq52608: reading /etc/resolv.conf
...
etc.

at this point the router started to manage the traffic... (about 10:02:16-10:02:20)...
If you need the full log file I can send via email...

Thanks,
Michele

Actions #5

Updated by Michele Di Maria about 13 years ago

I caught it, the master firewall started to work after this log line:

Apr 19 10:03:27 pfsense1 check_reload_status: reloading filter

it is consistent with the previous screenshot...
The carp vips should change the state to up only after this event, not before, otherwise the vips are up but the traffic can't pass throught pfSense...

Thanks,
Michele

Actions #6

Updated by Ermal Luçi almost 13 years ago

I pushed another change so try with that.
Though i think carp needs to be teached about a 'start' sysctl as it has allow for incoming packets.
Or the allow sysctl meaning should be changed to control both sending and receiving.
The later makes more sense.

Actions #7

Updated by Ermal Luçi almost 13 years ago

  • % Done changed from 0 to 100

Applied in changeset commit:"9411fbf73e52f01730da3fc8ba663bc901087144".

Actions #8

Updated by Michele Di Maria almost 13 years ago

Tested and working! (see image) the red lines at 10:03:50 and 10:05:05 represents the moments where the secondary machine was promoting to master and demoting to slave.

At the first tests it didn't work yet (see on the image at 9:59:55) and it was strange because this time the delay was applied only to the traffic of the DMZ, and I was wondering why and trying to give an explaination... the reason was because of the DMZ switch setup. I configured each port of the DMZ switch with "spanning-tree portfast" then it was perfect...

Thanks a lot!
Michele

Actions #9

Updated by Chris Buechler almost 13 years ago

  • Status changed from Feedback to Resolved

thanks

Actions

Also available in: Atom PDF