Project

General

Profile

Bug #975

CARP / vip interface disappears on slave after interface change

Added by Rob Lister over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Virtual IPs
Target version:
Start date:
10/27/2010
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.0
Affected Architecture:

Description

In my testing 2.0 (build Mon Oct 25 02:28:25 EDT 2010) I think I have found an issue when
multiple CARP virtual interfaces are configured.

After changing the configuration on the interface for example change of subnet mask,
the system seems to forget about the (first?) vip on the slave box:

ifconfig -a before:

master:

vip200: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 172.20.0.15 netmask 0xffffffff
carp: MASTER vhid 200 advbase 1 advskew 0
vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 94.143.111.70 netmask 0xffffffff
carp: MASTER vhid 201 advbase 1 advskew 0

backup:

vip200: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 172.20.0.15 netmask 0xffffffff
carp: BACKUP vhid 200 advbase 1 advskew 100
vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet xx.xx.xx.70 netmask 0xffffffff
carp: BACKUP vhid 201 advbase 1 advskew 100

(Now, on the MASTER, change the interface subnet mask, click apply changes)

After, on slave, vip200 is gone:

vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet xx.xx.xx.70 netmask 0xffffffff
carp: BACKUP vhid 201 advbase 1 advskew 100

If, on the slave, I go to interfaces -> LAN -> SAVE, without making any
changes, it restores the vips.
(At this point, on the slave, it sometimes jumps out to a wrong, old style stylesheet and thinks
the interface is not enabled. Click to home page to get it back again.)

vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet xx.xx.xx.70 netmask 0xffffffff
carp: BACKUP vhid 201 advbase 1 advskew 100
vip200: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 172.20.0.15 netmask 0xffffffff
carp: BACKUP vhid 200 advbase 1 advskew 100

Also, I have observed a problem when doing ifconfig xxx down on the primary,
the slave takes over as it should, but on doing ifconfig xxx up on the primary,
primary gets stuck in INIT state and does not take over. (or is there some sort
of preempt delay before it comes bask? I tried waiting but seemed to stay like that)

Web interface shows both as in INIT on primary, ifconfig shows actually
that vip200 is stuck in INIT, but vip201 claims to be MASTER:

vip200: flags=8<LOOPBACK> metric 0 mtu 1500
inet 172.20.0.15 netmask 0xffffffff
carp: INIT vhid 200 advbase 1 advskew 0
vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet xx.xx.xx.70 netmask 0xffffffff
carp: MASTER vhid 201 advbase 1 advskew 0

Meanwhile, the secondary claims he is also the master:

vip201: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet xx.xx.xx.70 netmask 0xffffffff
carp: BACKUP vhid 201 advbase 1 advskew 100
vip200: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 172.20.0.15 netmask 0xffffffff
carp: MASTER vhid 200 advbase 1 advskew 100

On the master, clicking on the interface config page and then SAVE restores
it to being the master, but then deletes the first VIP from the slave as described
above.

(see attached screenshots)

Regards,

Rob

pf-carp-before.png (153 KB) pf-carp-before.png Rob Lister, 10/27/2010 03:17 PM
pf-carp-before.png (153 KB) pf-carp-before.png Rob Lister, 10/27/2010 03:17 PM
pf-style-glitch.png (122 KB) pf-style-glitch.png Rob Lister, 10/27/2010 03:17 PM

Associated revisions

Revision a8200dbf (diff)
Added by Ermal Luçi over 8 years ago

Ticket #975. Rearrange code a little.

Revision f48b6205 (diff)
Added by Ermal Luçi over 8 years ago

Ticket #975. Properly initialize variables to avoid caching issues. Also check an array exists before trying to foreach to avoid errors.

History

#1 Updated by Chris Buechler over 8 years ago

  • Project changed from pfSense Packages to pfSense

#2 Updated by Chris Buechler over 8 years ago

  • Category set to Virtual IPs
  • Target version set to 2.0

#3 Updated by Rob Lister over 8 years ago

I think this is possibly related to Bug #959

Will wait and see if that is corrected first and test again on a newer version.

Rob

#4 Updated by Ermal Luçi over 8 years ago

  • Status changed from New to Feedback

Please try latest snapshot.

#5 Updated by Rob Lister over 8 years ago

Have updated both boxes to snapshot built on Wed Oct 27 18:59:53 EDT 2010 and the problem
still seems to be there.

Rob

#6 Updated by Rob Lister over 8 years ago

Maybe that snapshot doesn't include the patch, as the date on it is
20:56 on 27th. Latest snapshot is only Oct 27 18:59:53.

#7 Updated by Chris Buechler over 8 years ago

Rob, is this fixed on the latest snapshot?

#8 Updated by Rob Lister over 8 years ago

Yes, I had been unable to update because of problems with the amd64 build and met with disaster that
meant had to rebuild it all from the last known bootable version.
(I hit the problem described in http://redmine.pfsense.org/issues/995)

We have to use this version in production as Broadcomm NIC is not properly supported in the stable version.
(Only comes up at 10/full and we need 1000/full!)

But I got brave enough to try again, and as of build: Wed Nov 17 05:45:58 UTC 2010, it has been running
for about 9 days and this problem has gone.

Thanks,

Rob

#9 Updated by Chris Buechler over 8 years ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF