Bug #7256
closedsyslogd is not running after installing or uninstalling a package with logging (e.g. tinc, haproxy)
Added by Kill Bill over 7 years ago. Updated over 7 years ago.
100%
Description
Suspect it's related to Feature #4898. Last log entry:
Feb 12 14:17:54 gw php: /etc/rc.packages: Successfully installed package: haproxy. Feb 12 14:17:54 gw kernel: done.
Reproduced on multiple boxes.
Updated by Kill Bill over 7 years ago
Here's some more context:
Feb 12 14:16:27 gw php: /etc/rc.packages: The command '/usr/local/etc/rc.d/haproxy.sh stop' returned exit code '1', the output was 'Stopping haproxy. Stopping haproxy. No matching processes were found' Feb 12 14:16:27 gw kernel: done. Feb 12 14:16:36 gw check_reload_status: Syncing firewall Feb 12 14:17:08 gw syslogd: exiting on signal 15 Feb 12 14:17:50 gw syslogd: kernel boot file is /boot/kernel/kernel Feb 12 14:17:54 gw check_reload_status: Syncing firewall Feb 12 14:17:54 gw php: /etc/rc.packages: Successfully installed package: haproxy. Feb 12 14:17:54 gw kernel: done.
So, the first line would be the old haproxy version uninstall. Then a firewall is resynced, then syslogd gets a SIGTERM, firewall is resynced again on package new version install, and then the package install completes and syslogd is gone. Hmmmm... not exactly good.
EDIT: Again the same thing with upgrade to 0.52_6:
$ ps ax | grep 'syslogd' 72518 - Ss 0:37.46 /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -l /tmp/haproxy_chroot/var/run/log -P /var/run/syslog.pid -f /var/etc/syslog.conf -b 10.0.0.254
After upgrade:
$ ps ax | grep 'syslogd'
System log:
Feb 14 14:58:58 gw php: /etc/rc.packages: The command '/usr/local/etc/rc.d/haproxy.sh stop' returned exit code '1', the output was 'Stopping haproxy. Stopping haproxy. No matching processes were found' Feb 14 14:59:05 gw check_reload_status: Syncing firewall Feb 14 14:59:34 gw syslogd: exiting on signal 15 Feb 14 14:59:34 gw syslogd: kernel boot file is /boot/kernel/kernel Feb 14 15:00:07 gw syslogd: kernel boot file is /boot/kernel/kernel Feb 14 15:00:12 gw check_reload_status: Syncing firewall Feb 14 15:00:12 gw php: /etc/rc.packages: Successfully installed package: haproxy.
Updated by Jim Thompson over 7 years ago
- Assignee set to Jim Pingle
- Priority changed from High to Normal
Updated by Jim Pingle over 7 years ago
- Subject changed from (Re)installing haproxy kills syslogd to syslogd is not running after installing or uninstalling a package with logging (e.g. tinc, haproxy)
- Target version set to 2.4.0
Was this on 2.3.2 or 2.3.3?
It also affects tinc, happens every time it is added or removed on 2.4
Updated by Kill Bill over 7 years ago
2.3.3/2.3.4, really anything after adding the chrooted socket commit to syslogd.
https://redmine.pfsense.org/projects/pfsense/repository/revisions/ab31acb937792bdedef11fcdbd1d98ad126ebc0b
https://redmine.pfsense.org/issues/4898
Updated by Jim Pingle over 7 years ago
- Assignee changed from Jim Pingle to Renato Botelho
I thought it might be due to it running twice in some cases but even only being run once, for some reason it's not starting it back up. I changed the code to write out the command and it works when run by hand but not when installing or uninstalling these packages.
tinc does have a command in its install script to create/clear the log file which is unnecessary but it doesn't appear to be affecting this problem.
syslogd is being restarted in both the deinstall and post-deinstall stage of package removal but it should only be run during post-deinstall as at least with haproxy the settings are not fully removed during the deinstall step so for example it still tries to use the haproxy log socket at that point.
I can take another stab at it tomorrow but the more eyes on it the better.
It happens from the GUI or the shell so it's easy to see the problem by running commands in the shell:
ps uxaww | grep '[s]yslogd' pkg install -y pfSense-pkg-tinc ps uxaww | grep '[s]yslogd' pkg delete -y tinc pfSense-pkg-tinc ps uxaww | grep '[s]yslogd'
Updated by Jim Pingle over 7 years ago
Worked on this a bit more but still haven't nailed down the reason. It is executing the command to run syslog, it's just not staying up/running. No errors, debug mode or foreground mode for syslog shows nothing at all in the output. If I run the same command by hand, it works.
This seems vaguely familiar, similar to what we saw in years past with sigprocmask and ntp but toggling that made no difference either.
A snippet of the truss output just after syslogd was run:
58519: read(9,"<118>Mar 7 12:56:28 snmpd[58486"...,1023) = 237 (0xed) 58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0) 58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0) 58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0) 58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0) 58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0) 58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0) 58519: read(9,0x7fffffffe150,1023) ERR#35 'Resource temporarily unavailable' 58519: select(10,{ 4 5 6 9 },0x0,0x0,{ 0.000000 }) = 0 (0x0) 58519: kill(58406,SIGALRM) = 0 (0x0) 58406: fork() = 58519 (0xe497) 58406: SIGNAL 14 (SIGALRM) 58406: process killed, signal = 14 58273: fork() = 58406 (0xe426) 58273: write(1,"58406\n",6) = 6 (0x6) 58273: exit(0x0) 58273: process exit, rval = 0 56273: read(9,"58406\n",8192) = 6 (0x6) 56273: read(9,0x80eeef810,8192) = 0 (0x0)
Some of that is nohup finishing up running the command and passing back the PID.
Updated by Kill Bill over 7 years ago
Erm, guys, this is super-annoying. Lost logging on many boxes on recent HAproxy updates without realizing what happened, something needs to be done here.
Can https://github.com/pfsense/pfsense/pull/3567 be merged to 2.3.x as well? That at least makes it possible to add the syslogd service to service watchdog. On that note, I'd say the logging is so critical feature that it should get respawned on its own when enabled without configuring anything anywhere in case syslogd gets killed for whatever reason, be it this bug or whatever other reason.
Updated by Jim Pingle over 7 years ago
With 3a0df77eebf27d027d512a61dcbf80adefd630c4 I can now install/remove haproxy and tinc without affecting syslogd.
I did also have to remove a redundant line from tinc's install procedure as well:
https://github.com/pfsense/FreeBSD-ports/commit/b2877187d9d3582a0718b766126d79a70ff4b1d5
I'll let it soak on 2.4 a bit and if it proves OK there we can bring all of this (and that syslogd service stuff) to 2.3.
Updated by Jim Pingle over 7 years ago
On the latest 2.4 snapshot I can install and uninstall tinc and haproxy and syslogd never dies, and it appears to reload the config OK as I can see it opening and closing the log files as expected.
Now to see how viable the same fixes are for 2.3.x.
Updated by Kill Bill over 7 years ago
Seems much better. Reinstalled haproxy a couple of times and syslogd keeps logging. Still wondering what's the real bug here, huh...
Updated by Jim Pingle over 7 years ago
- Status changed from Confirmed to Feedback
- Assignee changed from Renato Botelho to Jim Pingle
As far as I've been able to tell, it appears that syslogd will not stay running in the background as a daemon when it is launched through PHP invoked from pkg. This is a complete guess, but I have to wonder if pkg is doing some process/child cleanup when it exits that doesn't get along with leaving syslogd running.
I like the way I've changed it, though, since it no longer kills/restarts syslogd unnecessarily when a HUP will do to add/remove log entries.
The changes seem to work OK on 2.3.x, so I ported them back (the syslogd service GUI changes for #4382 and the fixes for this ticket).
I didn't cherry-pick them from the original commits, but rather did them in one commit each for easier patching of units in the wild.
There was also a change to tinc that I made to help, but it's already out there for everyone.
Updated by Jim Pingle over 7 years ago
- % Done changed from 0 to 100
Applied in changeset 576cbe26c184734e93f59320d43aeb2e510c9804.
Updated by Jim Pingle over 7 years ago
- Status changed from Feedback to Resolved
Working fine on the latest 2.3.x snapshots as well. Closing.
Updated by Jim Pingle over 7 years ago
- Target version changed from 2.4.0 to 2.3.4-p1