Bug #1433
closedConfig sync causes CARP state change
100%
Description
Any config change causes the CARP IPs on the secondary to come up as master and back down to backup, which is unnecessary and will potentially cause any number of related issues, such as this for one:
http://forum.pfsense.org/index.php/topic,35509.0.html
Files
Updated by Ermal Luçi over 13 years ago
Well this is normal considering that the slave just destroys and recreates its vips and an election occurs in carp code.
There is no choice on this other than non-reconfiguring the vips on the backup.
Though its strange for hitting this in such a short period. Because both nodes are master at this time!
Updated by Michele Di Maria over 13 years ago
Makes sense that the VIPS are destroyed and recreated after reconfiguring on the backup machine, unfortunately when this occurs and I make several changes to the configuration, I get on my backup machine "kernel panic, error 12". I hope it's related to this issue.
Maybe there's the way to compare the vips configuration (the running one and the new received by the master), if the new configuration influences the vips then they should be destroyed/recreated, if not there's no reason to do that.
This will not solve the problem, but will minimize it a lot, minimizing the times both nodes are master.
Updated by Michele Di Maria over 13 years ago
In etc/inc/interfaces.inc, before line 1827, there could be something like:
$ints = get_interface_arr(true);
foreach ($existingif as $if) {
if ($vip['mode']=="carp" && $vip['subnet']==$existing['subnet'] && $vip['realif']=$realif['realif'] && $vip['subnet']==realif['subnet'] etc. etc. etc. all the parameters)
return $vip;
}
I don't know if it makes sense, plus I have no developing environment for pfSense (and I know it very little from the developer point of view) and I have very little experience in php (probably the above code is full of syntax errors), it's just a draft to explain what I mean...
If all the parameters of an existing interface are matched to the new interface, then just exit the function...
What do you think about it?
Updated by Chris Buechler over 13 years ago
the secondary has no need to blow away its CARP IPs and recreate them unless there has been a CARP change, and never did in previous versions. Doing so has the potential IPsec issue noted here, also the issue on a recent thread on discussion@. Once the system is setup, it's very rare to change CARP, but config syncs can happen quite frequently depending on the system, given the problems this has already caused it really needs to only recreate CARP if CARP has changed.
Updated by Ermal Luçi over 13 years ago
pfSense is a long time having this code.
It was done because otherwise lots of code need to be added just to test for carp and also there would be issues if a carp is deleted on master it would be a lot of work to detect this on the backup!
Updated by Adam Thompson over 13 years ago
Do we have the ability to diff chunks of the config xml? If diff(old-carp-config,new-carp-config)==zero-changes, don't bother ifconfig'ing? I've noticed this affecting my systems in the last week as I've been bringing up a new CARP cluster. On more than one occasion, the secondary and primary both got stuck in Master (yeah, I know that should never happen - and I'll open a new bug if I can figure out when/why it happens) which wouldn't have happened in the first place except for this issue.
Updated by Ermal Luçi over 13 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 100
Applied in changeset commit:"9411fbf73e52f01730da3fc8ba663bc901087144".
Updated by Michele Di Maria over 13 years ago
mmhh... with build "Fri Apr 22 18:24:14 EDT 2011" i386 on both machines, when I modify something on the master machine (for example a port forward rule in the firwall) I still see on the secondary machine, in the system logs, the vip interfaces going up and down... I can attach the log if needed...
Updated by Ermal Luçi over 13 years ago
I am sorry but it will still go up and down on slave but it want cause any issues!
If you can reproduce the issue than its another thing.
Updated by Michele Di Maria over 13 years ago
- File problem_1.png problem_1.png added
- File problem_2.jpg problem_2.jpg added
Ok, I understand... I try to explain you the problems I encounter, you give me your opinion if it's related to this issue or not, ok?
The first problem is that when I change something on the master machine, on the secondary I can note a "peak of traffic" in the wan machine (see problem_1.png). It looks like "a lot of traffic" (almost 800kbps), not just a "switch up and down", so it looks realistic that for some small time both machines acts like "master" (like described from Adam). I don't know if this impacts on the real traffic passing through pfSense, to me it seems that anyway this peak did not break any traffic;
The second problem is that, if I make sequential changes to the primary machine config (editing firewall rules for example, I mean a change before waiting 5-10 seconds), the secondary machine hangs very often (see problem_2.png), not regularly every single change, but often. Maybe it is related to this issue (perhaps if I make changes on the master machine before the vips end the "up and down" cycle, or maybe it's some other error, really don't know).
I don't have enough experience in pfSense to confirm for sure that this two problems are caused by this issue, it just looks "a good explaination"... it it's not, I will open another bug (or ticket, even if I am not "blocked" so it's not urgent for me to fix).
Thanks a lot,
Michele
Updated by Ermal Luçi over 13 years ago
The second picture seems and issue from the shaper.
I cannot tell anything if i do not see the trace, type bt on the command prompt, from it.
The first one is possibly nothing to worry about and just xmlrpc traffic, though only a packet trace will show.
Updated by Michele Di Maria over 13 years ago
- File DSC_0174.jpg DSC_0174.jpg added
- File DSC_0175.jpg DSC_0175.jpg added
Ok for the first one...
For the second one I attach you two different crashes I just caused while making changes on the primary machine (sorry, I don't know a more intelligent way to send you a bt than taking a picture! :D)
Thanks a lot!
Michele
Updated by Eric Machabert over 13 years ago
I tryed the latest snapshot and I'm still having the CARP switch issue.
Each time I apply a change,using LAN CARP as admin URL, I have a certificate error or a blank page because of the switch between the two servers.
It has never been the case with 1.2.3.
Please ask if you need any other information.
Updated by Ermal Luçi over 13 years ago
I have made changes on the system which should fix this.
Please test latest snaps.
Updated by Michele Di Maria over 13 years ago
Ermal Luçi wrote:
I have made changes on the system which should fix this.
Please test latest snaps.
Yesterday evening I made a test making many changes on the primary machine, as result the secondary machine didn't hang and the changes didn't influence the running ipsec vpn running...
From my point of view seems the problem diappared...
Thanks a lot,
Michele
Updated by Ermal Luçi over 13 years ago
- Status changed from Feedback to Resolved