Bug #6882
closedbsnmpd uses all available CPU with hostres module active in some cases
100%
Description
Running 2.4, bsnmpd will consume all available CPU time when the hostres module is active. The CPU usage for geom also rises:
52822 root 96 0 95984K 46028K RUN 0:43 84.77% /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid 14 root -8 - 0K 48K - 1:22 12.30% [geom{g_event}]
Disable the host resources module and both return to near-zero CPU usage.
Affected: pfSense 2.4 on ESXi 6
Unaffected: pfSense 2.4 on SG-8860
Small bit of output from truss while it was consuming high CPU:
86964: openat(AT_FDCWD,"/dev/da0",O_NONBLOCK,00) = 18 (0x12) 86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) = 0 (0x0) 86964: __sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd44,0x7fffffffbd30,0x802ca61ca,0x11) = 0 (0x0) 86964: __sysctl(0x7fffffffbd44,0x3,0x0,0x7fffffffbd38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbd44,0x3,0x801ae1840,0x7fffffffbd38,0x0,0x0) = 0 (0x0) 86964: getpid() = 86964 (0x153b4) 86964: close(18) = 0 (0x0) 86964: openat(AT_FDCWD,"/dev/cd0",O_NONBLOCK,00) = 18 (0x12) 86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) ERR#2 'No such file or directory' 86964: close(18) = 0 (0x0) 86964: select(15,{ 3 5 6 8 10 14 },{ },{ },{ 0.755153 }) = 1 (0x1) 86964: read(14,"!system=CAM subsystem=periph typ"...,512) = 176 (0xb0) 86964: __sysctl(0x7fffffffbce0,0x2,0x7fffffffbd30,0x7fffffffbd28,0x8028765f6,0xb) = 0 (0x0) 86964: __sysctl(0x7fffffffbd30,0x3,0x7fffffffbe28,0x7fffffffbe20,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbed8,0x2,0x7fffffffbe40,0x7fffffffc060,0x802876650,0xe) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) 86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0) [that repeats over and over]
It appears to be tied to the presence of a CD drive in the VM. If a CD drive is present, connected or not, bsnmpd misbehaves. Removing the CD drive from the VM allows the hostres module to be used.
Updated by Jim Pingle almost 8 years ago
Reproduced it on a stock FreeBSD 11 system.
Opened a bug report upstream: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215471
Updated by Jim Pingle almost 8 years ago
- Status changed from Confirmed to Feedback
- % Done changed from 0 to 100
Applied in changeset 43de83978ed93c9a4886e2844e341af0f3fe9a05.
Updated by Jim Pingle almost 8 years ago
- Status changed from Feedback to Needs Patch
- Target version changed from 2.4.0 to 2.4.1
The workaround is present and prevents the problem case from causing harm.
Rather than close this out, I'll leave it open and set it to the next likely release so we can check if the bug has been fixed upstream. When it is fixed upstream, the workaround can be removed.
Updated by Stéphane Lapie about 7 years ago
It should be noted that when upgrading, if the host resources MIB was already selected BEFORE the upgrade, it will remain that way, and the bug will take effect.
The problem is : since the configuration item is now masked in the UI, it seems it is also impossible to deactivate the MIB. (Or does saving the SNMP configuration again clear the flag?)
Updated by Jim Pingle about 7 years ago
It looks like there is actually a patch for FreeBSD now which might help: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209368
Stéphane Lapie wrote:
(Or does saving the SNMP configuration again clear the flag?)
The backend code should already be omitting hostres when it detects the condition which causes the GUI to disable the control.
Updated by Chris Stocker about 7 years ago
Looks Like KVM has the same issue btw. I am running 2.4.1-DEVELOPMENT (amd64)
built on Fri Oct 13 12:32:36 CDT 2017 if that makes a difference
[2.4.1-DEVELOPMENT][]/home/root: top -b 5
last pid: 24068; load averages: 0.55, 0.45, 0.40 up 0+01:07:12 16:13:35
53 processes: 2 running, 51 sleeping
Mem: 117M Active, 405M Inact, 315M Wired, 199M Buf, 1114M Free
Swap: 2048M Total, 2048M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
91032 root 1 88 0 85624K 45472K CPU0 0 0:14 53.66% bsnmpd
938 root 1 21 0 9556K 5324K select 1 0:27 2.20% devd
45404 unbound 2 20 0 93644K 70760K kqread 1 1:29 0.00% unbound
884 root 1 20 0 282M 38296K kqread 1 0:08 0.00% php-fpm
20813 root 1 20 0 246M 46576K nanslp 0 0:06 0.00% php
[2.4.1-DEVELOPMENT][]/root: less /tmp/truss-output
close(15) = 0 (0x0)
gettimeofday({ 1507925879.568291 },0x0) = 0 (0x0)
gettimeofday({ 1507925879.568401 },0x0) = 0 (0x0)
select(15,{ 3 8 13 14 },{ },{ },{ 0.179103 }) = 1 (0x1)
read(8,"!system=CAM subsystem=periph typ"...,512) = 163 (0xa3)
_sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd40,0x7fffffffbd38,0x8022516f6,0xb) = 0 (0x0)
_sysctl(0x7fffffffbd40,0x3,0x7fffffffbe18,0x7fffffffbe10,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbec8,0x2,0x7fffffffbe30,0x7fffffffc050,0x802251750,0xe) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
__sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
Updated by Pi Ba about 7 years ago
Please either fix the underlying issue if possible. Or at least commit this workaround that seems to be needed for most (all?) hypervisor platforms..
https://github.com/pfsense/pfsense/pull/3821
Updated by Jim Pingle about 7 years ago
- Assignee changed from Jim Pingle to Renato Botelho
- Priority changed from Normal to Very High
Judging by some responses on the forum it can also happen on bare metal, not just VMs. The exact conditions are unclear for some, but at least one had a CD/DVD drive without media inside, similar to the VM case. Someone else saw it on a USB drive as well.
Hopefully the patch I linked above in note 6 will fix it for everyone.
Updated by Michael Knowles about 7 years ago
Just found this bug report after chasing down the issue on my office's pfsense install that was upgraded to 2.4.
I can confirm that as per Stephane Lapie's comment 2 days ago "host resources" is not available through the GUI, so unticking a different module (in this case REGEX) and saving the config removed the high CPU state, and adding REGEX back in didn't return it to the high state.
Updated by Brendon Baumgartner about 7 years ago
I don't see any mention about memory leaks here but this might be related.
https://forum.pfsense.org/index.php?topic=138054.0
Updated by Jim Pingle about 7 years ago
- Status changed from Needs Patch to Resolved
After importing the FreeBSD patch, this appears to be OK. CPU usage is back to normal, no sign of a memory leak, and hostres can be active inside VMs and on hardware that showed the problem again.