Project

General

Profile

Actions

Bug #1226

closed

Possible DOS in CARP synchronization

Added by Alexander Kalashnikov over 13 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
CARP
Target version:
-
Start date:
01/23/2011
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.0
Affected Architecture:
All

Description

When you press "Force config sync" couple of times in a very short period of time (4\5 in a second) the slave machine will stuck.
Hovewer networking and routing will work smoothly since you can ping the slave.
All new processes which should be started (like second sshd\shell when you're trying to login via ssh) will never start.
Even in the local console you can not run 'top' to see what's happening.
Also php processes are consuming around 25% of CPU time.
And eventually you're getting "A communications error occured while attempting XMLRPC sync with username admin http://10.10.0.2:88." on a master because of such thing.
Only hard reset of the slave helps in this situation.

May be there is something wrong with locks?

Actions #1

Updated by Chris Buechler over 13 years ago

  • Target version deleted (2.0)

You're hanging PHP by doing that, don't do that is the answer. Killing all php processes at the console or an existing SSH session will fix it. The console menu over SSH can't load when PHP is hung. There are plenty of ways to DoS yourself especially when you're authenticated, should be fixed at some point assuming it's replicable but not targeting 2.0 as it would never be hit in a real scenario.

Actions #2

Updated by Alexander Kalashnikov over 13 years ago

I'm sure that that is a pretty real scenario, since that two or more admins can make some changes simultaneously.

I can not start any commands from the local console and I can not login via ssh. Somehow I've got the reboot command to be executed via ssh (ssh reboot) but the server got stuck and responses only to ping now.
Also until syslogd was online I've got some messages regarding the reboot. But in fact the server did not reboot. It seems like php init scripts causing the server to stuck at some point.

I think that all system scripts must be run only by the cli version of php.

Actions #3

Updated by Alexander Kalashnikov over 13 years ago

UPD:

System can be only rebooted by issuing ssh [ip] reboot -q

Actions #4

Updated by Chris Buechler about 13 years ago

I can't replicate this even clicking the force sync button as fast and as many times as I possibly can, it just works. May be something something that's easier to trigger on slow hardware. Triggering 4-5 config syncs per second isn't going to happen even if you have a whole team of admins logged in at once. 'ssh [ip] killall php' will fix too.

Actions #5

Updated by Alexander Kalashnikov about 13 years ago

I can reproduce it only using a "big" configuration file (~120 firewall rules + 10 interfaces) and with moderate HW performance difference on nodes (2.8GHz CoreDuo and 2.1Ghz Celeron). Master is more powerfull. In my virtual lab I could not reproduce an issue with the same config. And you're right: the only difference I saw is that both WMs has the same productivity but not my real firewall configurations. To reproduce an issue try to rise php fcgi processes priority on the slave.

kill(all) does not work with any SIGNAL, only reboot helps.

It's obvious that you need to check for current status of the slave before the syncronization.

Actions #6

Updated by Jim Pingle about 11 years ago

  • Status changed from New to Closed

This has, in all likelihood, been fixed since then. The behavior would at least have changed on 2.1 after the recent php+lighty changes.

Actions

Also available in: Atom PDF