Project

General

Profile

Actions

Bug #7256

closed

syslogd is not running after installing or uninstalling a package with logging (e.g. tinc, haproxy)

Added by Kill Bill about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Logging
Target version:
Start date:
02/13/2017
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3.x
Affected Architecture:
All

Description

Suspect it's related to Feature #4898. Last log entry:

Feb 12 14:17:54 gw php: /etc/rc.packages: Successfully installed package: haproxy.
Feb 12 14:17:54 gw kernel: done.

Reproduced on multiple boxes.

Actions #1

Updated by Kill Bill about 7 years ago

Here's some more context:

Feb 12 14:16:27 gw php: /etc/rc.packages: The command '/usr/local/etc/rc.d/haproxy.sh stop' returned exit code '1', the output was 'Stopping haproxy. Stopping haproxy. No matching processes were found'
Feb 12 14:16:27 gw kernel: done.
Feb 12 14:16:36 gw check_reload_status: Syncing firewall
Feb 12 14:17:08 gw syslogd: exiting on signal 15
Feb 12 14:17:50 gw syslogd: kernel boot file is /boot/kernel/kernel
Feb 12 14:17:54 gw check_reload_status: Syncing firewall
Feb 12 14:17:54 gw php: /etc/rc.packages: Successfully installed package: haproxy.
Feb 12 14:17:54 gw kernel: done.

So, the first line would be the old haproxy version uninstall. Then a firewall is resynced, then syslogd gets a SIGTERM, firewall is resynced again on package new version install, and then the package install completes and syslogd is gone. Hmmmm... not exactly good.

EDIT: Again the same thing with upgrade to 0.52_6:

$ ps ax | grep 'syslogd'
72518  -  Ss       0:37.46 /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -l /tmp/haproxy_chroot/var/run/log -P /var/run/syslog.pid -f /var/etc/syslog.conf -b 10.0.0.254

After upgrade:

$ ps ax | grep 'syslogd'

System log:

Feb 14 14:58:58 gw php: /etc/rc.packages: The command '/usr/local/etc/rc.d/haproxy.sh stop' returned exit code '1', the output was 'Stopping haproxy. Stopping haproxy. No matching processes were found'
Feb 14 14:59:05 gw check_reload_status: Syncing firewall
Feb 14 14:59:34 gw syslogd: exiting on signal 15
Feb 14 14:59:34 gw syslogd: kernel boot file is /boot/kernel/kernel
Feb 14 15:00:07 gw syslogd: kernel boot file is /boot/kernel/kernel
Feb 14 15:00:12 gw check_reload_status: Syncing firewall
Feb 14 15:00:12 gw php: /etc/rc.packages: Successfully installed package: haproxy.

Actions #2

Updated by Jim Thompson about 7 years ago

  • Assignee set to Jim Pingle
  • Priority changed from High to Normal
Actions #3

Updated by Jim Pingle about 7 years ago

  • Subject changed from (Re)installing haproxy kills syslogd to syslogd is not running after installing or uninstalling a package with logging (e.g. tinc, haproxy)
  • Target version set to 2.4.0

Was this on 2.3.2 or 2.3.3?

It also affects tinc, happens every time it is added or removed on 2.4

Actions #4

Updated by Kill Bill about 7 years ago

Actions #5

Updated by Jim Pingle about 7 years ago

  • Assignee changed from Jim Pingle to Renato Botelho

I thought it might be due to it running twice in some cases but even only being run once, for some reason it's not starting it back up. I changed the code to write out the command and it works when run by hand but not when installing or uninstalling these packages.

tinc does have a command in its install script to create/clear the log file which is unnecessary but it doesn't appear to be affecting this problem.

syslogd is being restarted in both the deinstall and post-deinstall stage of package removal but it should only be run during post-deinstall as at least with haproxy the settings are not fully removed during the deinstall step so for example it still tries to use the haproxy log socket at that point.

I can take another stab at it tomorrow but the more eyes on it the better.

It happens from the GUI or the shell so it's easy to see the problem by running commands in the shell:

ps uxaww | grep '[s]yslogd'
pkg install -y pfSense-pkg-tinc
ps uxaww | grep '[s]yslogd'
pkg delete -y tinc pfSense-pkg-tinc
ps uxaww | grep '[s]yslogd'
Actions #6

Updated by Jim Pingle about 7 years ago

  • Status changed from New to Confirmed
Actions #7

Updated by Jim Pingle about 7 years ago

Worked on this a bit more but still haven't nailed down the reason. It is executing the command to run syslog, it's just not staying up/running. No errors, debug mode or foreground mode for syslog shows nothing at all in the output. If I run the same command by hand, it works.

This seems vaguely familiar, similar to what we saw in years past with sigprocmask and ntp but toggling that made no difference either.

A snippet of the truss output just after syslogd was run:

58519: read(9,"<118>Mar  7 12:56:28 snmpd[58486"...,1023) = 237 (0xed)
58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0)
58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0)
58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0)
58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0)
58519: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGALRM },{ }) = 0 (0x0)
58519: sigprocmask(SIG_SETMASK,{ },{ SIGHUP|SIGALRM }) = 0 (0x0)
58519: read(9,0x7fffffffe150,1023)         ERR#35 'Resource temporarily unavailable'
58519: select(10,{ 4 5 6 9 },0x0,0x0,{ 0.000000 }) = 0 (0x0)
58519: kill(58406,SIGALRM)             = 0 (0x0)
58406: fork()                     = 58519 (0xe497)
58406: SIGNAL 14 (SIGALRM)
58406: process killed, signal = 14
58273: fork()                     = 58406 (0xe426)
58273: write(1,"58406\n",6)             = 6 (0x6)
58273: exit(0x0)                
58273: process exit, rval = 0
56273: read(9,"58406\n",8192)             = 6 (0x6)
56273: read(9,0x80eeef810,8192)             = 0 (0x0)

Some of that is nohup finishing up running the command and passing back the PID.

Actions #8

Updated by Kill Bill almost 7 years ago

Erm, guys, this is super-annoying. Lost logging on many boxes on recent HAproxy updates without realizing what happened, something needs to be done here.

Can https://github.com/pfsense/pfsense/pull/3567 be merged to 2.3.x as well? That at least makes it possible to add the syslogd service to service watchdog. On that note, I'd say the logging is so critical feature that it should get respawned on its own when enabled without configuring anything anywhere in case syslogd gets killed for whatever reason, be it this bug or whatever other reason.

Actions #9

Updated by Jim Pingle almost 7 years ago

With 3a0df77eebf27d027d512a61dcbf80adefd630c4 I can now install/remove haproxy and tinc without affecting syslogd.

I did also have to remove a redundant line from tinc's install procedure as well:
https://github.com/pfsense/FreeBSD-ports/commit/b2877187d9d3582a0718b766126d79a70ff4b1d5

I'll let it soak on 2.4 a bit and if it proves OK there we can bring all of this (and that syslogd service stuff) to 2.3.

Actions #10

Updated by Jim Pingle almost 7 years ago

On the latest 2.4 snapshot I can install and uninstall tinc and haproxy and syslogd never dies, and it appears to reload the config OK as I can see it opening and closing the log files as expected.

Now to see how viable the same fixes are for 2.3.x.

Actions #11

Updated by Kill Bill almost 7 years ago

Seems much better. Reinstalled haproxy a couple of times and syslogd keeps logging. Still wondering what's the real bug here, huh...

Actions #12

Updated by Jim Pingle almost 7 years ago

  • Status changed from Confirmed to Feedback
  • Assignee changed from Renato Botelho to Jim Pingle

As far as I've been able to tell, it appears that syslogd will not stay running in the background as a daemon when it is launched through PHP invoked from pkg. This is a complete guess, but I have to wonder if pkg is doing some process/child cleanup when it exits that doesn't get along with leaving syslogd running.

I like the way I've changed it, though, since it no longer kills/restarts syslogd unnecessarily when a HUP will do to add/remove log entries.

The changes seem to work OK on 2.3.x, so I ported them back (the syslogd service GUI changes for #4382 and the fixes for this ticket).

I didn't cherry-pick them from the original commits, but rather did them in one commit each for easier patching of units in the wild.

There was also a change to tinc that I made to help, but it's already out there for everyone.

Actions #13

Updated by Jim Pingle almost 7 years ago

  • % Done changed from 0 to 100
Actions #14

Updated by Jim Pingle almost 7 years ago

  • Status changed from Feedback to Resolved

Working fine on the latest 2.3.x snapshots as well. Closing.

Actions #15

Updated by Jim Pingle almost 7 years ago

  • Target version changed from 2.4.0 to 2.3.4-p1
Actions

Also available in: Atom PDF