Project

General

Profile

Actions

Bug #6882

closed

bsnmpd uses all available CPU with hostres module active in some cases

Added by Jim Pingle almost 5 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Very High
Category:
SNMP
Target version:
Start date:
10/31/2016
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.4
Affected Architecture:

Description

Running 2.4, bsnmpd will consume all available CPU time when the hostres module is active. The CPU usage for geom also rises:

52822 root        96    0 95984K 46028K RUN      0:43  84.77% /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
   14 root        -8    -     0K    48K -        1:22  12.30% [geom{g_event}]

Disable the host resources module and both return to near-zero CPU usage.

Affected: pfSense 2.4 on ESXi 6
Unaffected: pfSense 2.4 on SG-8860

Small bit of output from truss while it was consuming high CPU:

86964: openat(AT_FDCWD,"/dev/da0",O_NONBLOCK,00) = 18 (0x12)
86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) = 0 (0x0)
86964: __sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd44,0x7fffffffbd30,0x802ca61ca,0x11) = 0 (0x0)
86964: __sysctl(0x7fffffffbd44,0x3,0x0,0x7fffffffbd38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbd44,0x3,0x801ae1840,0x7fffffffbd38,0x0,0x0) = 0 (0x0)
86964: getpid()                     = 86964 (0x153b4)
86964: close(18)                 = 0 (0x0)
86964: openat(AT_FDCWD,"/dev/cd0",O_NONBLOCK,00) = 18 (0x12)
86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) ERR#2 'No such file or directory'
86964: close(18)                 = 0 (0x0)
86964: select(15,{ 3 5 6 8 10 14 },{ },{ },{ 0.755153 }) = 1 (0x1)
86964: read(14,"!system=CAM subsystem=periph typ"...,512) = 176 (0xb0)
86964: __sysctl(0x7fffffffbce0,0x2,0x7fffffffbd30,0x7fffffffbd28,0x8028765f6,0xb) = 0 (0x0)
86964: __sysctl(0x7fffffffbd30,0x3,0x7fffffffbe28,0x7fffffffbe20,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbed8,0x2,0x7fffffffbe40,0x7fffffffc060,0x802876650,0xe) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
[that repeats over and over]

It appears to be tied to the presence of a CD drive in the VM. If a CD drive is present, connected or not, bsnmpd misbehaves. Removing the CD drive from the VM allows the hostres module to be used.

Actions #1

Updated by Jim Pingle over 4 years ago

Reproduced it on a stock FreeBSD 11 system.

Opened a bug report upstream: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215471

Actions #2

Updated by Jim Thompson over 4 years ago

  • Assignee set to Jim Pingle
Actions #3

Updated by Jim Pingle over 4 years ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100
Actions #4

Updated by Jim Pingle over 4 years ago

  • Status changed from Feedback to Needs Patch
  • Target version changed from 2.4.0 to 2.4.1

The workaround is present and prevents the problem case from causing harm.

Rather than close this out, I'll leave it open and set it to the next likely release so we can check if the bug has been fixed upstream. When it is fixed upstream, the workaround can be removed.

Actions #5

Updated by Stéphane Lapie almost 4 years ago

It should be noted that when upgrading, if the host resources MIB was already selected BEFORE the upgrade, it will remain that way, and the bug will take effect.
The problem is : since the configuration item is now masked in the UI, it seems it is also impossible to deactivate the MIB. (Or does saving the SNMP configuration again clear the flag?)

Actions #6

Updated by Jim Pingle almost 4 years ago

It looks like there is actually a patch for FreeBSD now which might help: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209368

Stéphane Lapie wrote:

(Or does saving the SNMP configuration again clear the flag?)

The backend code should already be omitting hostres when it detects the condition which causes the GUI to disable the control.

Actions #7

Updated by Chris Stocker almost 4 years ago

Looks Like KVM has the same issue btw. I am running 2.4.1-DEVELOPMENT (amd64)
built on Fri Oct 13 12:32:36 CDT 2017 if that makes a difference

[2.4.1-DEVELOPMENT][]/home/root: top -b 5
last pid: 24068; load averages: 0.55, 0.45, 0.40 up 0+01:07:12 16:13:35
53 processes: 2 running, 51 sleeping

Mem: 117M Active, 405M Inact, 315M Wired, 199M Buf, 1114M Free
Swap: 2048M Total, 2048M Free

PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
91032 root 1 88 0 85624K 45472K CPU0 0 0:14 53.66% bsnmpd
938 root 1 21 0 9556K 5324K select 1 0:27 2.20% devd
45404 unbound 2 20 0 93644K 70760K kqread 1 1:29 0.00% unbound
884 root 1 20 0 282M 38296K kqread 1 0:08 0.00% php-fpm
20813 root 1 20 0 246M 46576K nanslp 0 0:06 0.00% php

[2.4.1-DEVELOPMENT][]/root: less /tmp/truss-output
close(15) = 0 (0x0)
gettimeofday({ 1507925879.568291 },0x0) = 0 (0x0)
gettimeofday({ 1507925879.568401 },0x0) = 0 (0x0)
select(15,{ 3 8 13 14 },{ },{ },{ 0.179103 }) = 1 (0x1)
read(8,"!system=CAM subsystem=periph typ"...,512) = 163 (0xa3)
_sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd40,0x7fffffffbd38,0x8022516f6,0xb) = 0 (0x0)
_sysctl(0x7fffffffbd40,0x3,0x7fffffffbe18,0x7fffffffbe10,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbec8,0x2,0x7fffffffbe30,0x7fffffffc050,0x802251750,0xe) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
__sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)

Actions #8

Updated by Pi Ba almost 4 years ago

Please either fix the underlying issue if possible. Or at least commit this workaround that seems to be needed for most (all?) hypervisor platforms..
https://github.com/pfsense/pfsense/pull/3821

Actions #9

Updated by Jim Pingle almost 4 years ago

  • Assignee changed from Jim Pingle to Renato Botelho
  • Priority changed from Normal to Very High

Judging by some responses on the forum it can also happen on bare metal, not just VMs. The exact conditions are unclear for some, but at least one had a CD/DVD drive without media inside, similar to the VM case. Someone else saw it on a USB drive as well.

Hopefully the patch I linked above in note 6 will fix it for everyone.

Actions #10

Updated by Michael Knowles almost 4 years ago

Just found this bug report after chasing down the issue on my office's pfsense install that was upgraded to 2.4.

I can confirm that as per Stephane Lapie's comment 2 days ago "host resources" is not available through the GUI, so unticking a different module (in this case REGEX) and saving the config removed the high CPU state, and adding REGEX back in didn't return it to the high state.

Actions #11

Updated by Brendon Baumgartner almost 4 years ago

I don't see any mention about memory leaks here but this might be related.
https://forum.pfsense.org/index.php?topic=138054.0

Actions #12

Updated by Jim Pingle almost 4 years ago

  • Status changed from Needs Patch to Resolved

After importing the FreeBSD patch, this appears to be OK. CPU usage is back to normal, no sign of a memory leak, and hostres can be active inside VMs and on hardware that showed the problem again.

Actions

Also available in: Atom PDF