Bug #1226
closed
Possible DOS in CARP synchronization
Added by Alexander Kalashnikov almost 14 years ago.
Updated almost 12 years ago.
Affected Architecture:
All
Description
When you press "Force config sync" couple of times in a very short period of time (4\5 in a second) the slave machine will stuck.
Hovewer networking and routing will work smoothly since you can ping the slave.
All new processes which should be started (like second sshd\shell when you're trying to login via ssh) will never start.
Even in the local console you can not run 'top' to see what's happening.
Also php processes are consuming around 25% of CPU time.
And eventually you're getting "A communications error occured while attempting XMLRPC sync with username admin http://10.10.0.2:88." on a master because of such thing.
Only hard reset of the slave helps in this situation.
May be there is something wrong with locks?
- Target version deleted (
2.0)
You're hanging PHP by doing that, don't do that is the answer. Killing all php processes at the console or an existing SSH session will fix it. The console menu over SSH can't load when PHP is hung. There are plenty of ways to DoS yourself especially when you're authenticated, should be fixed at some point assuming it's replicable but not targeting 2.0 as it would never be hit in a real scenario.
I'm sure that that is a pretty real scenario, since that two or more admins can make some changes simultaneously.
I can not start any commands from the local console and I can not login via ssh. Somehow I've got the reboot command to be executed via ssh (ssh root@server.localdomain.local reboot) but the server got stuck and responses only to ping now.
Also until syslogd was online I've got some messages regarding the reboot. But in fact the server did not reboot. It seems like php init scripts causing the server to stuck at some point.
I think that all system scripts must be run only by the cli version of php.
UPD:
System can be only rebooted by issuing ssh [ip] reboot -q
I can't replicate this even clicking the force sync button as fast and as many times as I possibly can, it just works. May be something something that's easier to trigger on slow hardware. Triggering 4-5 config syncs per second isn't going to happen even if you have a whole team of admins logged in at once. 'ssh [ip] killall php' will fix too.
I can reproduce it only using a "big" configuration file (~120 firewall rules + 10 interfaces) and with moderate HW performance difference on nodes (2.8GHz CoreDuo and 2.1Ghz Celeron). Master is more powerfull. In my virtual lab I could not reproduce an issue with the same config. And you're right: the only difference I saw is that both WMs has the same productivity but not my real firewall configurations. To reproduce an issue try to rise php fcgi processes priority on the slave.
kill(all) does not work with any SIGNAL, only reboot helps.
It's obvious that you need to check for current status of the slave before the syncronization.
- Status changed from New to Closed
This has, in all likelihood, been fixed since then. The behavior would at least have changed on 2.1 after the recent php+lighty changes.
Also available in: Atom
PDF