Bug #15353: Crashes Every ~8-12 Hours in New 2.7.2 Install with Unbound, Suricata, and pfBlockerNG - pfSense - pfSense bugtracker

Actions

Copy link

Bug #15353

open

Crashes Every ~8-12 Hours in New 2.7.2 Install with Unbound, Suricata, and pfBlockerNG

Added by Devin Dawson over 2 years ago. Updated over 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Category:

FreeBSD

Target version:

Start date:

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Default

Affected Version:

2.7.2

Affected Architecture:

amd64

Description

After reading some FreeBSD posts, it appears that this bug is potentially triggered by high CPU load. This occurs for me particularly during reloading or updating pfblockerNG, even though it's not consistently reproducible. I've attempted some mitigations such as disabling promiscuous mode in Suricata and restricting its use to the WAN interface, which seems to reduce the frequency of the issue but does not eliminate it entirely. Previously, running pfblockerNG in python mode alongside Suricata on both LAN and WAN interfaces resulted in the bug occurring more frequently.

The crash tends to happen approximately every 8 hours or so and appears to be related to two other FreeBSD issues:

FreeBSD Commit "vm: Fix racy checks for swap objects" - https://cgit.freebsd.org/src/commit/?id=e123264e4dc394602f9fed2f0376204b5998d815
    FreeBSD Bug Report "panic: vm_page_free_prep: freeing mapped page" - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261707"

Further investigation and possible collaboration with the FreeBSD community may be necessary to address this issue effectively.

Intel(R) Pentium(R) CPU G3250 @ 3.20GHz
2 CPUs: 1 package(s) x 2 core(s)
AES-NI CPU Crypto: No
QAT Crypto: No
Kernel PTI    Enabled
MDS Mitigation    VERW

amd64
14.0-CURRENT
FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec  6 20:45:47 UTC 2023     root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/F

Filename: /var/crash/textdump.tar.0
ddb.txt

db:0:kdb.enter.default>  show registers
cs                        0x20
ds                        0x3b
es                        0x3b
fs                        0x13
gs                        0x1b
ss                           0
rax                       0x12
rcx         0xffffffff81451bc8
rdx         0xffffffff844195ff
rbx                      0x100
rsp         0xfffffe00f5272780
rbp         0xfffffe00f5272780
rsi         0xfffffe00f52721f0
rdi         0xffffffff82d3f3d8  vt_conswindow+0x10
r8                        0x10
r9                        0x10
r10                        0xf
r11                       0x10
r12                          0
r13                        0x2
r14         0xffffffff813d55bb
r15         0xfffffe00f54e6e40
rip         0xffffffff80d32342  kdb_enter+0x32
rflags                    0x82
kdb_enter+0x32: movq    $0,0x234a4c3(%rip)
db:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu = 0xfffffe009af25f80
curthread    = 0xfffffe00f54e6e40: pid 27610 tid 100715 critnest 1 "unbound-control" 
curpcb       = 0xfffffe00f54e7360
fpcurthread  = 0xfffffe00f54e6e40: pid 27610 "unbound-control" 
idlethread   = 0xfffffe001de1ec80: tid 100004 "idle: cpu1" 
self         = 0xffffffff84011000
curpmap      = 0xfffff803a5a05ad0
tssp         = 0xffffffff84011384
rsp0         = 0xfffffe00f5273000
kcr3         = 0x800000008aefd67f
ucr3         = 0x8000000271748e7f
scr3         = 0x271748e7f
gs32p        = 0xffffffff84011404
ldt          = 0xffffffff84011444
tss          = 0xffffffff84011434
curvnet      = 0
db:0:kdb.enter.default>  bt
Tracing pid 27610 tid 100715 td 0xfffffe00f54e6e40
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00f5272780
vpanic() at vpanic+0x163/frame 0xfffffe00f52728b0
panic() at panic+0x43/frame 0xfffffe00f5272910
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00f5272970
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00f52729d0
calltrap() at calltrap+0x8/frame 0xfffffe00f52729d0
--- trap 0xc, rip = 0xffffffff8127ee47, rsp = 0xfffffe00f5272aa0, rbp = 0xfffffe00f5272ac0 ---
free_pv_entry() at free_pv_entry+0x47/frame 0xfffffe00f5272ac0
pmap_pv_promote_pde() at pmap_pv_promote_pde+0x14e/frame 0xfffffe00f5272b00
pmap_promote_pde() at pmap_promote_pde+0x2fa/frame 0xfffffe00f5272b80
pmap_enter() at pmap_enter+0xe8f/frame 0xfffffe00f5272c50
vm_fault() at vm_fault+0xbf4/frame 0xfffffe00f5272d60
vm_fault_trap() at vm_fault_trap+0x6b/frame 0xfffffe00f5272db0
trap_pfault() at trap_pfault+0x1d9/frame 0xfffffe00f5272e10
trap() at trap+0x442/frame 0xfffffe00f5272f30
calltrap() at calltrap+0x8/frame 0xfffffe00f5272f30
--- trap 0xc, rip = 0x82784d8d0, rsp = 0x820a9f758, rbp = 0x820a9f940 ---

Actions

Copy link

Updated by Mike Moore over 2 years ago

I see quite a few posts on the forum around recent instability. I myself is currently facing the issue with high system util being seen. Still troubleshooting but good find on that.

Actions

Copy link

Updated by Devin Dawson over 2 years ago

Thanks for the feedback, this was my first post here. I have more logs if necessary.

I disabled virtualization in the bios which seemed to give me a little extra life before crash/reboot, but it still occurs with the same errors. It's all bro-science right now.

Actions

Copy link

Updated by Steven Brown over 1 year ago

I wanted to update and say we see a similar issue with a router locking up randomly. Weirdly, it is mainly affecting our secondary router, while the primary in the HA failover seems to be running fine most of the time (thankfully).

Troubleshooting is difficult because access to the system is restricted and it stops responding to network. We reboot it using remote power management.

When tracing the issue, we found it was triggered often after doing a pfBlockerNG update. We've reduced the update frequency of pfBlockerNG to daily, and this has helped significantly in calming the problem. I think it is in the 'filter reload' though that the issue is triggering, and not something specific to pfBlockerNG itself. Just that pfBlockerNG triggers the filter reload.

We don't run Suricata, but we do have Snort running.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Bug #15353

Crashes Every ~8-12 Hours in New 2.7.2 Install with Unbound, Suricata, and pfBlockerNG

Updated by Mike Moore over 2 years ago

Updated by Devin Dawson over 2 years ago

Updated by Steven Brown over 1 year ago