Bug #15353
openCrashes Every ~8-12 Hours in New 2.7.2 Install with Unbound, Suricata, and pfBlockerNG
0%
Description
After reading some FreeBSD posts, it appears that this bug is potentially triggered by high CPU load. This occurs for me particularly during reloading or updating pfblockerNG, even though it's not consistently reproducible. I've attempted some mitigations such as disabling promiscuous mode in Suricata and restricting its use to the WAN interface, which seems to reduce the frequency of the issue but does not eliminate it entirely. Previously, running pfblockerNG in python mode alongside Suricata on both LAN and WAN interfaces resulted in the bug occurring more frequently.
The crash tends to happen approximately every 8 hours or so and appears to be related to two other FreeBSD issues:
FreeBSD Commit "vm: Fix racy checks for swap objects" - https://cgit.freebsd.org/src/commit/?id=e123264e4dc394602f9fed2f0376204b5998d815
FreeBSD Bug Report "panic: vm_page_free_prep: freeing mapped page" - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261707"
Further investigation and possible collaboration with the FreeBSD community may be necessary to address this issue effectively.
Intel(R) Pentium(R) CPU G3250 @ 3.20GHz
2 CPUs: 1 package(s) x 2 core(s)
AES-NI CPU Crypto: No
QAT Crypto: No
Kernel PTI Enabled
MDS Mitigation VERW
amd64
14.0-CURRENT
FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec 6 20:45:47 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/F
Filename: /var/crash/textdump.tar.0
ddb.txt
db:0:kdb.enter.default> show registers
cs 0x20
ds 0x3b
es 0x3b
fs 0x13
gs 0x1b
ss 0
rax 0x12
rcx 0xffffffff81451bc8
rdx 0xffffffff844195ff
rbx 0x100
rsp 0xfffffe00f5272780
rbp 0xfffffe00f5272780
rsi 0xfffffe00f52721f0
rdi 0xffffffff82d3f3d8 vt_conswindow+0x10
r8 0x10
r9 0x10
r10 0xf
r11 0x10
r12 0
r13 0x2
r14 0xffffffff813d55bb
r15 0xfffffe00f54e6e40
rip 0xffffffff80d32342 kdb_enter+0x32
rflags 0x82
kdb_enter+0x32: movq $0,0x234a4c3(%rip)
db:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo> show alllocks
No such command; use "help" to list available commands
db:1:lockinfo> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 1
dynamic pcpu = 0xfffffe009af25f80
curthread = 0xfffffe00f54e6e40: pid 27610 tid 100715 critnest 1 "unbound-control"
curpcb = 0xfffffe00f54e7360
fpcurthread = 0xfffffe00f54e6e40: pid 27610 "unbound-control"
idlethread = 0xfffffe001de1ec80: tid 100004 "idle: cpu1"
self = 0xffffffff84011000
curpmap = 0xfffff803a5a05ad0
tssp = 0xffffffff84011384
rsp0 = 0xfffffe00f5273000
kcr3 = 0x800000008aefd67f
ucr3 = 0x8000000271748e7f
scr3 = 0x271748e7f
gs32p = 0xffffffff84011404
ldt = 0xffffffff84011444
tss = 0xffffffff84011434
curvnet = 0
db:0:kdb.enter.default> bt
Tracing pid 27610 tid 100715 td 0xfffffe00f54e6e40
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00f5272780
vpanic() at vpanic+0x163/frame 0xfffffe00f52728b0
panic() at panic+0x43/frame 0xfffffe00f5272910
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00f5272970
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00f52729d0
calltrap() at calltrap+0x8/frame 0xfffffe00f52729d0
--- trap 0xc, rip = 0xffffffff8127ee47, rsp = 0xfffffe00f5272aa0, rbp = 0xfffffe00f5272ac0 ---
free_pv_entry() at free_pv_entry+0x47/frame 0xfffffe00f5272ac0
pmap_pv_promote_pde() at pmap_pv_promote_pde+0x14e/frame 0xfffffe00f5272b00
pmap_promote_pde() at pmap_promote_pde+0x2fa/frame 0xfffffe00f5272b80
pmap_enter() at pmap_enter+0xe8f/frame 0xfffffe00f5272c50
vm_fault() at vm_fault+0xbf4/frame 0xfffffe00f5272d60
vm_fault_trap() at vm_fault_trap+0x6b/frame 0xfffffe00f5272db0
trap_pfault() at trap_pfault+0x1d9/frame 0xfffffe00f5272e10
trap() at trap+0x442/frame 0xfffffe00f5272f30
calltrap() at calltrap+0x8/frame 0xfffffe00f5272f30
--- trap 0xc, rip = 0x82784d8d0, rsp = 0x820a9f758, rbp = 0x820a9f940 ---
Updated by Mike Moore 8 months ago
I see quite a few posts on the forum around recent instability. I myself is currently facing the issue with high system util being seen. Still troubleshooting but good find on that.
Updated by Devin Dawson 8 months ago
Thanks for the feedback, this was my first post here. I have more logs if necessary.
I disabled virtualization in the bios which seemed to give me a little extra life before crash/reboot, but it still occurs with the same errors. It's all bro-science right now.
Updated by Steven Brown about 1 month ago
I wanted to update and say we see a similar issue with a router locking up randomly. Weirdly, it is mainly affecting our secondary router, while the primary in the HA failover seems to be running fine most of the time (thankfully).
Troubleshooting is difficult because access to the system is restricted and it stops responding to network. We reboot it using remote power management.
When tracing the issue, we found it was triggered often after doing a pfBlockerNG update. We've reduced the update frequency of pfBlockerNG to daily, and this has helped significantly in calming the problem. I think it is in the 'filter reload' though that the issue is triggering, and not something specific to pfBlockerNG itself. Just that pfBlockerNG triggers the filter reload.
We don't run Suricata, but we do have Snort running.