Project

General

Profile

Bug #1433

Config sync causes CARP state change

Added by Chris Buechler over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Interfaces
Target version:
Start date:
04/12/2011
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.0
Affected Architecture:

Description

Any config change causes the CARP IPs on the secondary to come up as master and back down to backup, which is unnecessary and will potentially cause any number of related issues, such as this for one:
http://forum.pfsense.org/index.php/topic,35509.0.html

problem_1.png (165 KB) problem_1.png First problem: both machines looks like working as master Michele Di Maria, 04/25/2011 12:55 PM
problem_2.jpg (370 KB) problem_2.jpg Second problem: secondary machina hangs if primary changes too often the config Michele Di Maria, 04/25/2011 12:55 PM
DSC_0174.jpg (539 KB) DSC_0174.jpg First bt Michele Di Maria, 04/26/2011 02:00 PM
DSC_0175.jpg (515 KB) DSC_0175.jpg Second bt Michele Di Maria, 04/26/2011 02:00 PM

Associated revisions

Revision 4888535b (diff)
Added by Scott Ullrich almost 12 years ago

Disable firmware upgrade for embedded and cdrom and suggest using the console option to upgrade.

Ticket #1433

Revision 51611440 (diff)
Added by Ermal Luçi about 8 years ago

Ticket #1534, #1433. Properly merge carp interfaces and do not reload carp interfaces that have not change any configuration parameter. Also make merge_config_section_xmlrpc() an alias for restore_config_section_xmlrpc() since that what it is.

Revision f51d4f98 (diff)
Added by Ermal Luçi about 8 years ago

Ticket #1534, #1433. Remove custom sync code for vip, since it array_merge() replaces same keys data when merging. But make the code for reloading only changed vips after merge better and some more checks.

History

#1 Updated by Ermal Luçi over 8 years ago

Well this is normal considering that the slave just destroys and recreates its vips and an election occurs in carp code.
There is no choice on this other than non-reconfiguring the vips on the backup.
Though its strange for hitting this in such a short period. Because both nodes are master at this time!

#2 Updated by Michele Di Maria about 8 years ago

Makes sense that the VIPS are destroyed and recreated after reconfiguring on the backup machine, unfortunately when this occurs and I make several changes to the configuration, I get on my backup machine "kernel panic, error 12". I hope it's related to this issue.

Maybe there's the way to compare the vips configuration (the running one and the new received by the master), if the new configuration influences the vips then they should be destroyed/recreated, if not there's no reason to do that.

This will not solve the problem, but will minimize it a lot, minimizing the times both nodes are master.

#3 Updated by Michele Di Maria about 8 years ago

In etc/inc/interfaces.inc, before line 1827, there could be something like:

$ints = get_interface_arr(true);
foreach ($existingif as $if) {
if ($vip['mode']=="carp" && $vip['subnet']==$existing['subnet'] && $vip['realif']=$realif['realif'] && $vip['subnet']==realif['subnet'] etc. etc. etc. all the parameters)
return $vip;
}

I don't know if it makes sense, plus I have no developing environment for pfSense (and I know it very little from the developer point of view) and I have very little experience in php (probably the above code is full of syntax errors), it's just a draft to explain what I mean...
If all the parameters of an existing interface are matched to the new interface, then just exit the function...
What do you think about it?

#4 Updated by Chris Buechler about 8 years ago

the secondary has no need to blow away its CARP IPs and recreate them unless there has been a CARP change, and never did in previous versions. Doing so has the potential IPsec issue noted here, also the issue on a recent thread on discussion@. Once the system is setup, it's very rare to change CARP, but config syncs can happen quite frequently depending on the system, given the problems this has already caused it really needs to only recreate CARP if CARP has changed.

#5 Updated by Ermal Luçi about 8 years ago

pfSense is a long time having this code.
It was done because otherwise lots of code need to be added just to test for carp and also there would be issues if a carp is deleted on master it would be a lot of work to detect this on the backup!

#6 Updated by Adam Thompson about 8 years ago

Do we have the ability to diff chunks of the config xml? If diff(old-carp-config,new-carp-config)==zero-changes, don't bother ifconfig'ing? I've noticed this affecting my systems in the last week as I've been bringing up a new CARP cluster. On more than one occasion, the secondary and primary both got stuck in Master (yeah, I know that should never happen - and I'll open a new bug if I can figure out when/why it happens) which wouldn't have happened in the first place except for this issue.

#7 Updated by Ermal Luçi about 8 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

Applied in changeset commit:"9411fbf73e52f01730da3fc8ba663bc901087144".

#8 Updated by Michele Di Maria about 8 years ago

mmhh... with build "Fri Apr 22 18:24:14 EDT 2011" i386 on both machines, when I modify something on the master machine (for example a port forward rule in the firwall) I still see on the secondary machine, in the system logs, the vip interfaces going up and down... I can attach the log if needed...

#9 Updated by Ermal Luçi about 8 years ago

I am sorry but it will still go up and down on slave but it want cause any issues!

If you can reproduce the issue than its another thing.

#10 Updated by Michele Di Maria about 8 years ago

Ok, I understand... I try to explain you the problems I encounter, you give me your opinion if it's related to this issue or not, ok?

The first problem is that when I change something on the master machine, on the secondary I can note a "peak of traffic" in the wan machine (see problem_1.png). It looks like "a lot of traffic" (almost 800kbps), not just a "switch up and down", so it looks realistic that for some small time both machines acts like "master" (like described from Adam). I don't know if this impacts on the real traffic passing through pfSense, to me it seems that anyway this peak did not break any traffic;

The second problem is that, if I make sequential changes to the primary machine config (editing firewall rules for example, I mean a change before waiting 5-10 seconds), the secondary machine hangs very often (see problem_2.png), not regularly every single change, but often. Maybe it is related to this issue (perhaps if I make changes on the master machine before the vips end the "up and down" cycle, or maybe it's some other error, really don't know).

I don't have enough experience in pfSense to confirm for sure that this two problems are caused by this issue, it just looks "a good explaination"... it it's not, I will open another bug (or ticket, even if I am not "blocked" so it's not urgent for me to fix).

Thanks a lot,
Michele

#11 Updated by Ermal Luçi about 8 years ago

The second picture seems and issue from the shaper.
I cannot tell anything if i do not see the trace, type bt on the command prompt, from it.

The first one is possibly nothing to worry about and just xmlrpc traffic, though only a packet trace will show.

#12 Updated by Michele Di Maria about 8 years ago

Ok for the first one...
For the second one I attach you two different crashes I just caused while making changes on the primary machine (sorry, I don't know a more intelligent way to send you a bt than taking a picture! :D)

Thanks a lot!
Michele

#13 Updated by Eric Machabert about 8 years ago

I tryed the latest snapshot and I'm still having the CARP switch issue.
Each time I apply a change,using LAN CARP as admin URL, I have a certificate error or a blank page because of the switch between the two servers.
It has never been the case with 1.2.3.

Please ask if you need any other information.

#14 Updated by Ermal Luçi about 8 years ago

I have made changes on the system which should fix this.
Please test latest snaps.

#15 Updated by Michele Di Maria about 8 years ago

Ermal Luçi wrote:

I have made changes on the system which should fix this.
Please test latest snaps.

Yesterday evening I made a test making many changes on the primary machine, as result the secondary machine didn't hang and the changes didn't influence the running ipsec vpn running...

From my point of view seems the problem diappared...

Thanks a lot,
Michele

#16 Updated by Ermal Luçi about 8 years ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF