Project

General

Profile

Bug #6882

bsnmpd uses all available CPU with hostres module active in some cases

Added by Jim Pingle about 1 year ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Very High
Category:
SNMP
Target version:
Start date:
10/31/2016
Due date:
% Done:

100%

Affected Version:
2.4
Affected Architecture:

Description

Running 2.4, bsnmpd will consume all available CPU time when the hostres module is active. The CPU usage for geom also rises:

52822 root        96    0 95984K 46028K RUN      0:43  84.77% /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
   14 root        -8    -     0K    48K -        1:22  12.30% [geom{g_event}]

Disable the host resources module and both return to near-zero CPU usage.

Affected: pfSense 2.4 on ESXi 6
Unaffected: pfSense 2.4 on SG-8860

Small bit of output from truss while it was consuming high CPU:

86964: openat(AT_FDCWD,"/dev/da0",O_NONBLOCK,00) = 18 (0x12)
86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) = 0 (0x0)
86964: __sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd44,0x7fffffffbd30,0x802ca61ca,0x11) = 0 (0x0)
86964: __sysctl(0x7fffffffbd44,0x3,0x0,0x7fffffffbd38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbd44,0x3,0x801ae1840,0x7fffffffbd38,0x0,0x0) = 0 (0x0)
86964: getpid()                     = 86964 (0x153b4)
86964: close(18)                 = 0 (0x0)
86964: openat(AT_FDCWD,"/dev/cd0",O_NONBLOCK,00) = 18 (0x12)
86964: ioctl(18,0x40086481 { IOR 0x64('d'), 129, 8 },0xffffbe08) ERR#2 'No such file or directory'
86964: close(18)                 = 0 (0x0)
86964: select(15,{ 3 5 6 8 10 14 },{ },{ },{ 0.755153 }) = 1 (0x1)
86964: read(14,"!system=CAM subsystem=periph typ"...,512) = 176 (0xb0)
86964: __sysctl(0x7fffffffbce0,0x2,0x7fffffffbd30,0x7fffffffbd28,0x8028765f6,0xb) = 0 (0x0)
86964: __sysctl(0x7fffffffbd30,0x3,0x7fffffffbe28,0x7fffffffbe20,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbed8,0x2,0x7fffffffbe40,0x7fffffffc060,0x802876650,0xe) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
86964: __sysctl(0x7fffffffbe40,0x5,0x7fffffffbee0,0x7fffffffbe38,0x0,0x0) = 0 (0x0)
[that repeats over and over]

It appears to be tied to the presence of a CD drive in the VM. If a CD drive is present, connected or not, bsnmpd misbehaves. Removing the CD drive from the VM allows the hostres module to be used.

Associated revisions

Revision 9a548240
Added by Jim Pingle 11 months ago

Add VMware detection to system_identify_specific_platform(). Ticket #6882

Revision 43de8397
Added by Jim Pingle 11 months ago

Don't allow SNMP hostres module to be selected or used with VMware VMs that have a CD/DVD Drive device. Fixes #6882

History

#1 Updated by Jim Pingle 11 months ago

Reproduced it on a stock FreeBSD 11 system.

Opened a bug report upstream: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215471

#2 Updated by Jim Thompson 11 months ago

  • Assignee set to Jim Pingle

#3 Updated by Jim Pingle 11 months ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100

#4 Updated by Jim Pingle 10 months ago

  • Status changed from Feedback to Needs Patch
  • Target version changed from 2.4.0 to 2.4.1

The workaround is present and prevents the problem case from causing harm.

Rather than close this out, I'll leave it open and set it to the next likely release so we can check if the bug has been fixed upstream. When it is fixed upstream, the workaround can be removed.

#5 Updated by St├ęphane Lapie about 1 month ago

It should be noted that when upgrading, if the host resources MIB was already selected BEFORE the upgrade, it will remain that way, and the bug will take effect.
The problem is : since the configuration item is now masked in the UI, it seems it is also impossible to deactivate the MIB. (Or does saving the SNMP configuration again clear the flag?)

#6 Updated by Jim Pingle about 1 month ago

It looks like there is actually a patch for FreeBSD now which might help: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209368

St├ęphane Lapie wrote:

(Or does saving the SNMP configuration again clear the flag?)

The backend code should already be omitting hostres when it detects the condition which causes the GUI to disable the control.

#7 Updated by Chris Stocker about 1 month ago

Looks Like KVM has the same issue btw. I am running 2.4.1-DEVELOPMENT (amd64)
built on Fri Oct 13 12:32:36 CDT 2017 if that makes a difference

[2.4.1-DEVELOPMENT][]/home/root: top -b 5
last pid: 24068; load averages: 0.55, 0.45, 0.40 up 0+01:07:12 16:13:35
53 processes: 2 running, 51 sleeping

Mem: 117M Active, 405M Inact, 315M Wired, 199M Buf, 1114M Free
Swap: 2048M Total, 2048M Free

PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
91032 root 1 88 0 85624K 45472K CPU0 0 0:14 53.66% bsnmpd
938 root 1 21 0 9556K 5324K select 1 0:27 2.20% devd
45404 unbound 2 20 0 93644K 70760K kqread 1 1:29 0.00% unbound
884 root 1 20 0 282M 38296K kqread 1 0:08 0.00% php-fpm
20813 root 1 20 0 246M 46576K nanslp 0 0:06 0.00% php

[2.4.1-DEVELOPMENT][]/root: less /tmp/truss-output
close(15) = 0 (0x0)
gettimeofday({ 1507925879.568291 },0x0) = 0 (0x0)
gettimeofday({ 1507925879.568401 },0x0) = 0 (0x0)
select(15,{ 3 8 13 14 },{ },{ },{ 0.179103 }) = 1 (0x1)
read(8,"!system=CAM subsystem=periph typ"...,512) = 163 (0xa3)
_sysctl(0x7fffffffbcf0,0x2,0x7fffffffbd40,0x7fffffffbd38,0x8022516f6,0xb) = 0 (0x0)
_sysctl(0x7fffffffbd40,0x3,0x7fffffffbe18,0x7fffffffbe10,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbec8,0x2,0x7fffffffbe30,0x7fffffffc050,0x802251750,0xe) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
_sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)
__sysctl(0x7fffffffbe30,0x5,0x7fffffffbed0,0x7fffffffbe28,0x0,0x0) = 0 (0x0)

#8 Updated by Pi Ba about 1 month ago

Please either fix the underlying issue if possible. Or at least commit this workaround that seems to be needed for most (all?) hypervisor platforms..
https://github.com/pfsense/pfsense/pull/3821

#9 Updated by Jim Pingle about 1 month ago

  • Assignee changed from Jim Pingle to Renato Botelho
  • Priority changed from Normal to Very High

Judging by some responses on the forum it can also happen on bare metal, not just VMs. The exact conditions are unclear for some, but at least one had a CD/DVD drive without media inside, similar to the VM case. Someone else saw it on a USB drive as well.

Hopefully the patch I linked above in note 6 will fix it for everyone.

#10 Updated by Michael Knowles about 1 month ago

Just found this bug report after chasing down the issue on my office's pfsense install that was upgraded to 2.4.

I can confirm that as per Stephane Lapie's comment 2 days ago "host resources" is not available through the GUI, so unticking a different module (in this case REGEX) and saving the config removed the high CPU state, and adding REGEX back in didn't return it to the high state.

#11 Updated by Brendon Baumgartner about 1 month ago

I don't see any mention about memory leaks here but this might be related.
https://forum.pfsense.org/index.php?topic=138054.0

#12 Updated by Jim Pingle about 1 month ago

  • Status changed from Needs Patch to Resolved

After importing the FreeBSD patch, this appears to be OK. CPU usage is back to normal, no sign of a memory leak, and hostres can be active inside VMs and on hardware that showed the problem again.

Also available in: Atom PDF