Bug #4403
closedEnabling SNMP causes kernel panic with APU with empty SD card slot
0%
Description
Hi Together,
i am not sure if this is a hardware problem, but basically i am using a PC Engines APU.1C(2GB) board which is working fine until i try to enable SNMP via the web interface.
The APU.1C should be the same like your recommended hardware VK-T40E Desktop Firewall Router Appliance (https://www.pfsense.org/hardware/pfsense-store.html#vkt40e)
I tried this 2 times with PFSense 2.2 after update from 2.1 and after a fresh 2.2 install.
The system is working without any problem until i try enable the snmp with the following settings.
Webconfigurator > Services > SNMP
SNMP Daemon Enable Checked- Read Community string: public34tr497g429tr20ztg
Interface Binding - Bind Interface: LAN
Submitting the form is crashing the system.
After a power reset this is the output of the boot:
ugen2.1: <ATI> at usbus2 uhub2: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen3.1: <ATI> at usbus3 uhub3: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 480Mbps High Speed USB v2.0 ugen4.1: <ATI> at usbus4 uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <ATI> at usbus5 uhub5: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 ugen6.1: <ATI> at usbus6 uhub6: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus6 uhub4: 2 ports with 2 removable, self powered uhub0: 5 ports with 5 removable, self powered uhub2: 5 ports with 5 removable, self powered uhub5: 4 ports with 4 removable, self powered uhub6: 4 ports with 4 removable, self powered uhub1: 5 ports with 5 removable, self powered uhub3: 5 ports with 5 removable, self powered ugen6.2: <Generic> at usbus6 umass0: <Generic Flash Card ReaderWriter, class 0/0, rev 2.01/1.00, addr 2> on usbus6 ugen3.2: <HUAWEI Technology> at usbus3 u3g0: <HUAWEI Technology HUAWEI MOBILE WCDMA EM770W, class 0/0, rev 2.00/0.00, addr 2> on usbus3 u3g0: Found 6 ports. ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <KINGSTON SMS200S330G 541ABBF0> ATA-8 SATA 3.x device ada0: Serial Number 50026B724B0A8XXX ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada0: Command Queueing enabled ada0: 28626MB (58626288 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 da0 at umass-sim0 bus 0 scbus6 target 0 lun 0 da0: <Multiple Card Reader 1.00> Removable Direct Access SCSI-4 device da0: Serial Number 058F63666XXX da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present da0: quirks=0x2<NO_6_BYTE> SMP: AP CPU #1 Launched! Timecounter "TSC" frequency 1000019445 Hz quality 800 Trying to mount root from ufs:/dev/ada0s1a [rw]... WARNING: / was not properly dismounted Configuring crash dumps... Using /dev/ada0s1b for dump device. Mounting filesystems... ** /dev/ada0s1a ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes INCORRECT BLOCK COUNT I=562046 (8 should be 0) CORRECT? yes INCORRECT BLOCK COUNT I=562054 (8 should be 0) CORRECT? yes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT FILE I=562181 OWNER=0 MODE=100644 SIZE=442 MTIME=Feb 7 14:28 2015 COUNT 2 SHOULD BE 1 ADJUST? yes UNREF FILE I=2327427 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327428 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327429 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327430 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327431 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327432 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327437 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327438 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327439 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327440 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327441 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327442 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 CLEAR? yes UNREF FILE I=2327452 OWNER=0 MODE=100644 SIZE=0 MTIME=Feb 7 14:28 2015 RECONNECT? yes NO lost+found DIRECTORY CREATE? yes ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? yes SUMMARY INFORMATION BAD SALVAGE? yes BLK(S) MISSING IN BIT MAPS SALVAGE? yes 6313 files, 65408 used, 6012987 free (451 frags, 751567 blocks, 0.0% fragmentation) ***** FILE SYSTEM STILL DIRTY ***** ***** FILE SYSTEM WAS MODIFIED ***** ***** PLEASE RERUN FSCK ***** ** /dev/ada0s1a ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 6313 files, 65408 used, 6012987 free (451 frags, 751567 blocks, 0.0% fragmentation) ***** FILE SYSTEM MARKED CLEAN ***** Disabling APM on /dev/ad4 pwd_mkdb: root gid is incorrect pwd_mkdb: at line #1 pwd_mkdb: /etc/master.passwd: Inappropriate file type or format ___ ___/ f \ / p \___/ Sense \___/ \ \___/ Welcome to pfSense 2.2-RELEASE ... savecore: reboot savecore: writing core to /var/crash/textdump.tar.1 Creating symlinks......ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib 32-bit compatibility ldconfig path: /usr/lib32 done. Feb 7 14:30:44 system[253]: [ERROR] [pool lighty] cannot get uid for user 'root' [ERROR] [pool lighty] cannot get uid for user 'root' Feb 7 14:30:44 system[253]: [ERROR] FPM initialization failed [ERROR] FPM initialization failed fcgicli: Could not connect to server(/var/run/php-fpm.socket). Launching the init system... done. Initializing...................... done. Starting device manager (devd)... Warning: chown(): Unable to find uid for root in /etc/inc/config.lib.inc on line 867 Warning: chgrp(): Unable to find gid for proxy in /etc/inc/config.lib.inc on line 868 done. Loading configuration......done. Updating configuration...done. Cleaning backup cache.................................done. Setting up extended sysctls...done. Setting timezone...done. Configuring loopback interface...done. Starting syslog...done. Starting Secure Shell Services...done. Setting up polling defaults...done. Setting up interfaces microcode...done. Configuring loopback interface...done. Creating wireless clone interfaces...done. Configuring LAGG interfaces...done. Configuring VLAN interfaces...done. Configuring QinQ interfaces...done. Configuring WAN interface...done. Configuring MODEMACCESS interface...done. Configuring LAN interface...Starting DNS Resolver...done. Starting DHCPv6 service...done. done. Configuring CARP settings...done. Syncing OpenVPN settings...done. Configuring firewall......done. Starting PFLOG...done. Setting up gateway monitors...done. Synchronizing user settings...done. Starting webConfigurator...done. Configuring CRON...done. Starting DNS Resolver...done. Starting NTP time client...done. pgrep: Invalid pid in file `/var/dhcpd/var/run/dhcpd.pid' Starting DHCP service...done. Starting DHCPv6 service...done. Configuring firewall......done. Starting SNMP daemon... done. Generating RRD graphs... Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 289 Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0xffffffff80b6d4e5 stack pointer = 0x28:0xfffffe003609f840 frame pointer = 0x28:0xfffffe003609f850 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 32124 (bsnmpd) [ thread pid 32124 tid 100104 ] Stopped at strlcpy+0x25: movb (%rax),%dl db:0:kdb.enter.default> textdump set textdump set db:0:kdb.enter.default> capture on db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command db:1:locks> show alllocks No such command db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe0098d5a700 curthread = 0xfffff8000a4ea490: pid 32124 "bsnmpd" curpcb = 0xfffffe003609fcc0 fpcurthread = 0xfffff8000a4ea490: pid 32124 "bsnmpd" idlethread = 0xfffff8000320e920: tid 100004 "idle: cpu1" curpmap = 0xfffff800032199f8 tssp = 0xffffffff8218d078 commontssp = 0xffffffff8218d078 rsp0 = 0xfffffe003609fcc0 gs32p = 0xffffffff8218ead0 ldt = 0xffffffff8218eb10 tss = 0xffffffff8218eb00 db:0:kdb.enter.default> bt Tracing pid 32124 tid 100104 td 0xfffff8000a4ea490 strlcpy() at strlcpy+0x25/frame 0xfffffe003609f850 sysctl_rman() at sysctl_rman+0x1e1/frame 0xfffffe003609f930 sysctl_root() at sysctl_root+0x232/frame 0xfffffe003609f980 userland_sysctl() at userland_sysctl+0x1d8/frame 0xfffffe003609fa30 sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe003609fae0 amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe003609fbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe003609fbf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fb598a, rsp = 0x7fffffffa3d8, rbp = 0x7fffffffa410 --- db:0:kdb.enter.default> ps pid ppid pgrp uid state wmesg wchan cmd 32124 1 32124 0 Rs CPU 1 bsnmpd 31400 23140 21 0 S+ kqread 0xfffff8000aad2300 ntpdate 23140 1 21 0 S+ wait 0xfffff8000a8f8980 sh 21737 1 21444 0 S kqread 0xfffff8000a353100 lighttpd 16868 1 16868 0 Ss select 0xfffff8000a3704c0 inetd 16061 1 16061 0 Ss bpf 0xfffff8000a32a600 filterlog 6840 1 6840 0 Ss (threaded) mpd5 100106 S select 0xfffff8000a3732c0 mpd5 5966 1 5966 0 Ss select 0xfffff8000a3717c0 syslogd 269 1 269 0 Ss select 0xfffff8000a3719c0 devd 261 21 21 0 R+ CPU 0 php 258 256 256 0 S kqread 0xfffff8000a31fa00 check_reload_status 256 1 256 0 Ss kqread 0xfffff8000a320300 check_reload_status 67 0 0 0 DL mdwait 0xfffff8000a2f7000 [md0] 21 1 21 0 Ss+ pause 0xfffff8000a34a0a8 sh 20 0 0 0 DL syncer 0xffffffff81faef08 [syncer] 19 0 0 0 DL vlruwt 0xfffff8000a34a980 [vnlru] 18 0 0 0 DL psleep 0xffffffff81fae104 [bufdaemon] 17 0 0 0 DL pgzero 0xffffffff82100e8c [pagezero] 9 0 0 0 DL pollid 0xffffffff81f5c8f0 [idlepoll] 8 0 0 0 DL psleep 0xffffffff821005c0 [vmdaemon] 7 0 0 0 DL psleep 0xffffffff8218c384 [pagedaemon] 6 0 0 0 DL waiting_ 0xffffffff8217cdf0 [sctp_iterator] 5 0 0 0 DL pftm 0xffffffff80cff710 [pf purge] 16 0 0 0 DL (threaded) [usb] 100072 D - 0xfffffe0000f93010 [ucom] 100063 D - 0xfffffe0000976e18 [usbus6] 100062 D - 0xfffffe0000976dc0 [usbus6] 100061 D - 0xfffffe0000976d68 [usbus6] 100060 D - 0xfffffe0000976d10 [usbus6] 100059 D - 0xfffffe0000981460 [usbus5] 100058 D - 0xfffffe0000981408 [usbus5] 100057 D - 0xfffffe00009813b0 [usbus5] 100056 D - 0xfffffe0000981358 [usbus5] 100055 D - 0xfffffe000096d460 [usbus4] 100054 D - 0xfffffe000096d408 [usbus4] 100053 D - 0xfffffe000096d3b0 [usbus4] 100052 D - 0xfffffe000096d358 [usbus4] 100049 D - 0xfffffe0000962e18 [usbus3] 100048 D - 0xfffffe0000962dc0 [usbus3] 100047 D - 0xfffffe0000962d68 [usbus3] 100046 D - 0xfffffe0000962d10 [usbus3] 100045 D - 0xfffffe0000959460 [usbus2] 100044 D - 0xfffffe0000959408 [usbus2] 100043 D - 0xfffffe00009593b0 [usbus2] 100042 D - 0xfffffe0000959358 [usbus2] 100041 D - 0xfffffe000092ce18 [usbus1] 100040 D - 0xfffffe000092cdc0 [usbus1] 100039 D - 0xfffffe000092cd68 [usbus1] 100038 D - 0xfffffe000092cd10 [usbus1] 100036 D - 0xfffffe0000923460 [usbus0] 100035 D - 0xfffffe0000923408 [usbus0] 100034 D - 0xfffffe00009233b0 [usbus0] 100033 D - 0xfffffe0000923358 [usbus0] 4 0 0 0 DL (threaded) [cam] 100071 D - 0xffffffff81e96ac0 [scanner] 100027 D - 0xffffffff81e96c80 [doneq0] 3 0 0 0 DL crypto_r 0xffffffff820fea90 [crypto returns] 2 0 0 0 DL crypto_w 0xffffffff820fe938 [crypto] 15 0 0 0 DL - 0xffffffff81eb4180 [rand_harvestq] 14 0 0 0 DL (threaded) [geom] 100013 D - 0xffffffff82171560 [g_down] 100012 D - 0xffffffff82171558 [g_up] 100011 D - 0xffffffff82171550 [g_event] 13 0 0 0 DL (threaded) [ng_queue] 100010 D sleep 0xffffffff81e54fc8 [ng_queue1] 100009 D sleep 0xffffffff81e54fc8 [ng_queue0] 12 0 0 0 WL (threaded) [intr] 100080 I [swi1: netisr 1] 100069 I [swi1: pfsync] 100067 I [swi1: pf send] 100064 I [swi0: uart uart] 100051 I [irq15: ata1] 100050 I [irq14: ata0] 100037 I [irq17: ehci0 ehci1+] 100032 I [irq18: ohci0 ohci1*] 100031 I [irq19: ahci0] 100030 I [irq261: re2] 100029 I [irq260: re1] 100028 I [irq259: re0] 100025 I [swi5: fast taskq] 100023 I [swi6: Giant taskq] 100021 I [swi6: task queue] 100008 I [swi3: vm] 100007 I [swi4: clock] 100006 I [swi4: clock] 100005 I [swi1: netisr 0] 11 0 0 0 RL (threaded) [idle] 100004 CanRun [idle: cpu1] 100003 CanRun [idle: cpu0] 1 0 1 0 SLs wait 0xfffff800032084c0 [init] 10 0 0 0 DL audit_wo 0xffffffff82183970 [audit] 0 0 0 0 DLs (threaded) [kernel] 100070 D - 0xfffff800032b1000 [CAM taskq] 100065 D - 0xfffff8000a054900 [mca taskq] 100026 D - 0xfffff800032b1200 [kqueue taskq] 100024 D - 0xfffff800032b1700 [thread taskq] 100022 D - 0xfffff800032b1c00 [ffs_trim taskq] 100020 D - 0xfffff800032b2400 [acpi_task_2] 100019 D - 0xfffff800032b2400 [acpi_task_1] 100018 D - 0xfffff800032b2400 [acpi_task_0] 100014 D - 0xfffff800031fa500 [firmware taskq] 100000 D swapin 0xffffffff82171658 [swapper] db:0:kdb.enter.default> alltrace Tracing command bsnmpd pid 32124 tid 100104 td 0xfffff8000a4ea490 strlcpy() at strlcpy+0x25/frame 0xfffffe003609f850 sysctl_rman() at sysctl_rman+0x1e1/frame 0xfffffe003609f930 sysctl_root() at sysctl_root+0x232/frame 0xfffffe003609f980 userland_sysctl() at userland_sysctl+0x1d8/frame 0xfffffe003609fa30 sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe003609fae0 amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe003609fbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe003609fbf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fb598a, rsp = 0x7fffffffa3d8, rbp = 0x7fffffffa410 --- ...
Files
Updated by Chris Buechler almost 10 years ago
- Subject changed from Enabling SNMP Freezes System / Unbootable after Power Reset to Enabling SNMP causes kernel panic with APU in some circumstance
- Status changed from New to Confirmed
- Priority changed from Urgent to High
enough people have reported this that it's clearly an issue in some circumstance. I'm not sure what that circumstance is though. It's as simple for some as just enabling SNMP, don't even have to query it, and it kernel panics. Seems to be exclusive to the APU, but it didn't happen to me when enabling on an APU.
There are 2 diffs between my APU and what's shown here. Mine was on SD when I tried it, where the SD slot is empty here (thinking this is likely cause), and this has a 3G/4G/LTE modem in it. I'll install on a mSATA in my APU, remove the SD card, and see if that's replicable.
Updated by Jim Pingle almost 10 years ago
- File pac_PID5779_n2.txt pac_PID5779_n2.txt added
I can reproduce it on my APU now as well. Fresh install on mSATA, no SD card inserted, using the factory image.
Seems to only happen if the "Host Resources" module is selected, which requires mibII, but mibII alone is not enough to trigger the crash.
Backtrace looks about the same.
It wouldn't be the first time having a drive present with no media could crash via bsnmpd (cd devices without media mounted on ESX come to mind)
Updated by Jim Pingle almost 10 years ago
: sysctl hw.bus hw.bus.devctl_disable: 0 hw.bus.devctl_queue: 1000
sysctl -a works also.
Updated by Andreas Walther almost 10 years ago
Well the first crash after update from 2.1 to 2.2 was with a sd card as the disk and a mini pcie 3g modem installed.
The second time with a msata module, mini pcie 3g modem and without a sd card.
So i think it has nothing to do with the empty sd card slot.
Jim P did u had a mini pci-e card installed?
As far as i understand Chris Buechler did not had a crash like this with a "empty" board and a sd card.
So maybe it has something to do with populated pci express bus?
Updated by Jim Pingle almost 10 years ago
I don't have an SD card in, but I do have a Mini-PCIe wireless card.
Updated by Chris Buechler almost 10 years ago
- Subject changed from Enabling SNMP causes kernel panic with APU in some circumstance to Enabling SNMP causes kernel panic with APU with empty SD card slot
- Target version set to 2.2.1
the only scenario we've been able to replicate is with no SD card installed. It's easily replicable by just removing the SD card. I have a wifi card in mine. Haven't tried without it installed, but it doesn't seem to be specific to any particular add-on hardware.
There might be some other circumstance with the same symptoms.
Andreas: could you do some experimentation with your combination of hardware? See if it's the same with only the mSATA on the board. Add the cellular card and see what happens. Then put a SD card in it (with or without the mSATA SSD) and see what happens there.
Updated by Andreas Walther almost 10 years ago
Chris Buechler wrote:
Andreas: could you do some experimentation with your combination of hardware? See if it's the same with only the mSATA on the board. Add the cellular card and see what happens. Then put a SD card in it (with or without the mSATA SSD) and see what happens there. Just running "sysctl -a" will suffice to upgrade, and won't leave you with a bricked setup like enabling SNMP does.
Yes i can do that.
I will do it on the weekend so you have to wait a little.
Updated by Andreas Walther almost 10 years ago
I just started to test what combinations of hardware let this crash happen, but the command "systcl -a" is not crashing the system.
Updated by Marcel Janicki almost 10 years ago
Same for me during the upgrade from 2.1.5 (amd64) to 2.2 (amd64) on a APU.1C4 (4 GB).
Retried it successfully with a previously disabled bsnmpd. After the upgrade I enabled the snmp-daemon and the kernel crashed instantly.
pfSense is on a SD card, no mSATA SSD, no 3G/4G/LTE modem - but a Compex WLE200NX WLAN card.
Updated by Chris Buechler almost 10 years ago
I misunderstood JimP's earlier comment, running 'sysctl -a' won't panic it in the way enabling SNMP will.
Updated by Guillaume Leroy almost 10 years ago
I am running on a SD card (and without any other card) and I am encountering the problem.
Updated by Stefan Nunninger almost 10 years ago
- File snmp_bug_boot_log.txt snmp_bug_boot_log.txt added
I am experiencing this same bug.
I attach a log of the console output.
I am running pfsense 2.2 on an Alix APU board without SD-card. The system is stored on a MSATA SSD card.
Could somebody please eplain how to work around the boot loop.
I can boot a recovery system with pfsense from a USB stick.
Where can I disable the SNMP daemon to prevent the system to hang during boot?
Updated by Jim Pingle almost 10 years ago
To work around it, you can rename the bsnmpd binary or otherwise disable it. For example:
mv /usr/sbin/bsnmpd /usr/sbin/bsnmpd.tmp
Updated by Stefan Nunninger almost 10 years ago
Jim P wrote:
To work around it, you can rename the bsnmpd binary or otherwise disable it. For example:
mv /usr/sbin/bsnmpd /usr/sbin/bsnmpd.tmp
I tried to do so. This is what I did precisely:
- Boot with pfsense from USB stick
- login on console
- fsck -y -t ufs /dev/ad4s1 # fix the filesystem otherwise cannot mount it
- mount /dev/ad4s1 /mnt
- mv /mnt/sbin/bsnmpd /mnt/sbin/bsnmpd.tmp
- umount /mnt
- halt
- wait until system has stopped
- remove USB stick
- powercycle system to start again
Now the boot does not crash anymore but the console asks for a password and does not accept the admin password,
This is strange because I did not use a password on the console before.
Loging in on the website says: "503 - Service Not Available"
To did reset the password according to this desription:
https://doc.pfsense.org/index.php/I_locked_myself_out_of_the_WebGUI,_help!#Forgotten_Password_with_Locked_Console
However still I cannot login with admin/pfsense on the console.
It seems renaming the file /usr/sbin/bsnmpd had some strange side effects.
Can somebody please give me a hint what might go wrong.
Updated by Jim Pingle almost 10 years ago
The filesystem in /etc was likely corrupted as a result of the repeated panics. Reinstalling is the safest recovery method. Restore a backup with SNMP disabled. Follow up on the forum for further assistance, it's too far removed from this bug report to discuss it more here.
Updated by Chris Buechler almost 10 years ago
- Assignee changed from Ermal Luçi to Renato Botelho
A quick, low-risk work around for this is to use the APU detection to skip starting SNMP on an APU that doesn't have a SD card installed (or isn't running from one), and log an error instead. That should suffice for 2.2.1, at least not crashing the system. Post-2.2.1 it'll need further review to fix the root cause and remove the workaround.
Renato, go ahead with that work around.
Updated by Jim Pingle almost 10 years ago
From my testing it should be enough to skip only the hostres module on APU, other SNMP modules appeared to be OK, and that way people could still get some use out of SNMP on APU in the meantime.
Updated by Guillaume Leroy almost 10 years ago
- File crash-log.zip crash-log.zip added
I don't agree : we noticed that the problem also occurs with SD card based APU setups. This is my case.
And btw I would like to have SNMP working on my APU devices too as it used to in the previous releases, at least the standard MIBs/branches (especially interfaces).
Isn't this kernel panic only caused by a specific SNMP agent module that we could unload / remove from the compilation or from bsnmpd configuration ?
Updated by Guillaume Leroy almost 10 years ago
Jim P wrote:
From my testing it should be enough to skip only the hostres module on APU, other SNMP modules appeared to be OK, and that way people could still get some use out of SNMP on APU in the meantime.
+1 :-)
Updated by Jim Pingle almost 10 years ago
Guillaume Leroy wrote:
Isn't this kernel panic only caused by a specific SNMP agent module that we could unload / remove from the compilation or from bsnmpd configuration ?
I mentioned that finding way back in Note 2 above (#4403-2)
Updated by Renato Botelho almost 10 years ago
- Status changed from Confirmed to Feedback
Added a conditional to skip hostres on APU for now
Updated by Guillaume Leroy almost 10 years ago
Jim P wrote:
Guillaume Leroy wrote:
Isn't this kernel panic only caused by a specific SNMP agent module that we could unload / remove from the compilation or from bsnmpd configuration ?
I mentioned that finding way back in Note 2 above (#4403-2)
Very good work indeed !
I am now running 2.2 fine with bsnmpd enabled but Host Resources MIB module disabled.
Thanks.
Updated by Chris Buechler almost 10 years ago
- Status changed from Feedback to Confirmed
- Assignee changed from Renato Botelho to Ermal Luçi
- Target version changed from 2.2.1 to 2.2.2
Confirmed that works around it for now, moving this to 2.2.2 for a proper fix.
Updated by Chris Buechler over 9 years ago
- Target version changed from 2.2.2 to 2.2.3
Updated by Matt Meyer over 9 years ago
I've just hit this issue myself using an ALIX 2D13. There are no other devices except for the CF card.
Updated by Chris Buechler over 9 years ago
Matt: haven't heard of it on ALIX but same could impact it also. does disabling the host resources MIB prevent the issue for you?
Updated by Chris Buechler over 9 years ago
- Target version changed from 2.2.3 to 2.3
Updated by Ermal Luçi over 9 years ago
https://github.com/ocochard/BSDRP/blob/master/EINE/patches/freebsd.bsnmpd.hostres
Seems to have a patch for this issue.
Updated by Matt Meyer over 9 years ago
Chris Buechler wrote:
Matt: haven't heard of it on ALIX but same could impact it also. does disabling the host resources MIB prevent the issue for you?
In my case this didn't help. I have since reverted to 2.1.5 but could upgrade again to test any fixes.
Updated by Renato Botelho over 9 years ago
Ermal Luçi wrote:
https://github.com/ocochard/BSDRP/blob/master/EINE/patches/freebsd.bsnmpd.hostres
Seems to have a patch for this issue.
A similar change was already done on FreeBSD - https://svnweb.freebsd.org/base/releng/10.1/usr.sbin/bsnmpd/modules/snmp_hostres/hostres_storage_tbl.c?r1=228990&r2=248707
Updated by Jim Thompson over 9 years ago
- Assignee changed from Ermal Luçi to Renato Botelho
reassign to Renato. Maybe this is fixed in FreeBSD 10.2
Updated by Jim Thompson about 9 years ago
- Assignee changed from Renato Botelho to Chris Buechler
now reassigned to cmb
Updated by Chris Buechler about 9 years ago
can anyone still replicate this? Going back to 2.2.0-REL, full install on mSATA, no SD card, with or without an ath card, it doesn't happen. OP's APU is 2 GB and I'm on a 4 GB one, so tried setting hw.physmem in loader.conf to make it look like it has 2 GB RAM in case that was related, no dice. On 2.2.5 after removing the code that omits hostres on APUs it's fine as well. Also no issue on 2.3, but given I can't replicate where it was happening to others, that's not helpful.
Updated by Marcel Janicki about 9 years ago
Configuration: pfSense 2.2.5 amd64 on a SD card, no mSATA SSD, APU.1C4, 4 GB, no 3G/4G/LTE modem but a Compex WLE200NX WLAN card (Atheros).
I've re-enabled the SNMP module Host Resources and the kernel didn't crash.
Thank you!
Updated by Guillaume Leroy about 9 years ago
Almost same config on my side with an APU 1c, the nanobsd based system running on a SD card and nothing else.
I re-enabled the Host Resources module in the config and the system didn't crash. However I am not able to get any reply when polling the Host MIB either.
So I suppose the module is not really enabled probably because of the anticrash lock feature added when the issue was reported on the APU.
And I indeed don't see the module in the snmpd config file :
#cat /var/etc/snmpd.conf ... snmpEnableAuthenTraps = 2 begemotSnmpdModulePath."mibII" = "/usr/lib/snmp_mibII.so" begemotSnmpdModulePath."netgraph" = "/usr/lib/snmp_netgraph.so" %netgraph begemotNgControlNodeName = "snmpd" begemotSnmpdModulePath."pf" = "/usr/lib/snmp_pf.so" begemotSnmpdModulePath."ucd" = "/usr/local/lib/snmp_ucd.so" begemotSnmpdModulePath."regex" = "/usr/local/lib/snmp_regex.so"
Marcel Janicki wrote:
I've re-enabled the SNMP module Host Resources and the kernel didn't crash.
Marcel, did you really check that you are now able to poll the Host MIB ?
Updated by Guillaume Leroy about 9 years ago
OK, I've just run a manual test as I don't know how to force loading the module with pfSense.
I stopped the bsnmd instance started by pfSense and manually started another instance loading the system /etc/snmpd.config file with the host resources module enabled:
begemotSnmpdModulePath."hostres" = "/usr/lib/snmp_hostres.so"
And the system crashed immediately.
So the problem is still there in 2.2.5 with APU devices.
Updated by Chris Buechler about 9 years ago
Guillaume: could you try that same test on latest 2.3 and report back please?
Marcel: stock 2.2.1 and newer won't crash because hostres is skipped on APUs automatically, so you have to either remove that code or manually enable it.
Updated by Jim Pingle about 9 years ago
- File snmp-apu-hostres.diff snmp-apu-hostres.diff added
- Status changed from Confirmed to Feedback
I tested this on 2.3 today. I removed the APU check and confirmed hostres was present in the config. snmp started up and I was able to perform a full snmpwalk against it. No problems at all. Rebooted the unit and it still was running OK. Can wait for additional confirmation but it looks to me like this has been solved in FreeBSD 10.2.
The attached patch can be applied to remove the APU-specific check in services.inc so that hostres will be enabled on APU for those who want to test on 2.3.
I was able to easily reproduce the panic on 2.2.x on the same hardware (see my earlier notes)
Updated by Guillaume Leroy about 9 years ago
- File upgrade.log upgrade.log added
Good news...
However, on my side, I have not been able to successfully complete the upgrade to 2.3-ALPHA, a 45min lasting upgrade process get me back a broken system with a lot of php errors (see attached if interested). :-(
Updated by Chris Buechler about 9 years ago
- Status changed from Feedback to Resolved
- Affected Version changed from 2.2 to 2.2.x
Guillaume: start a thread on the 2.3 board on the forum and we can review that.
Since JimP can confirm, we're good here.
Updated by Marcel Janicki about 9 years ago
Guillaume, I didn't realized that the hostres module is still skipped on APUs..
Hence I removed the APU check in services.inc and the kernel crashed instantly (2.2.5, FreeBSD 10.1).
Good to hear that it's fixed with 10.2.