Regression #16790
openKernel panic due to race condition on a ``bpf`` device
0%
Description
There are already two customers who reported about crashes which happened after update up to 26.03 pfSense Plus.
It happened on two different hardware: 7100 (HS #44225707405) and 4100 (HS #44386928982).
Crash report showed:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 18 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80eb2b66 stack pointer = 0x0:0xfffffe0067e2f7c0 frame pointer = 0x0:0xfffffe0067e2f830 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_1) rdi: fffffe0008760a00 rsi: fffffe0067e2f7e0 rdx: 0000000000000001 rcx: 00000000000042a7 r8: 0000000000000002 r9: fffff80149a99360 rax: 0000000000000000 rbx: fffff80149a99300 rbp: fffffe0067e2f830 r10: fffffe0067e2f6f0 r11: 0000000000000001 r12: 0000000000000060 r13: fffff80008751800 r14: 0000000000000000 r15: fffff8017099e800 trap number = 12 panic: page fault cpuid = 1 time = 1776065740 KDB: enter: panic db:1:pfs> bt Tracing pid 0 tid 100008 td 0xfffff80100d0b000 kdb_enter() at kdb_enter+0x33/frame 0xfffffe0067e2f640 panic() at panic+0x43/frame 0xfffffe0067e2f6a0 trap_pfault() at trap_pfault+0x3cf/frame 0xfffffe0067e2f6f0 calltrap() at calltrap+0x8/frame 0xfffffe0067e2f6f0 --- trap 0xc, rip = 0xffffffff80eb2b66, rsp = 0xfffffe0067e2f7c0, rbp = 0xfffffe0067e2f830 --- bpf_mtap() at bpf_mtap+0x86/frame 0xfffffe0067e2f830 vlan_transmit() at vlan_transmit+0x42/frame 0xfffffe0067e2f880 ether_output_frame() at ether_output_frame+0xd3/frame 0xfffffe0067e2f8b0 ether_output() at ether_output+0x697/frame 0xfffffe0067e2f940 ip_tryforward() at ip_tryforward+0x4fc/frame 0xfffffe0067e2f9e0 ip_input() at ip_input+0x31d/frame 0xfffffe0067e2fa40 netisr_dispatch_src() at netisr_dispatch_src+0x1fa/frame 0xfffffe0067e2fa90 ether_demux() at ether_demux+0x194/frame 0xfffffe0067e2fac0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0067e2fb20 netisr_dispatch_src() at netisr_dispatch_src+0x9f/frame 0xfffffe0067e2fb70 ether_input() at ether_input+0x56/frame 0xfffffe0067e2fbc0 ether_demux() at ether_demux+0x8e/frame 0xfffffe0067e2fbf0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0067e2fc50 netisr_dispatch_src() at netisr_dispatch_src+0x9f/frame 0xfffffe0067e2fca0 ether_input() at ether_input+0x56/frame 0xfffffe0067e2fcf0 iflib_rxeof() at iflib_rxeof+0xa6f/frame 0xfffffe0067e2fe00 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0067e2fe40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe0067e2fec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe0067e2fef0 fork_exit() at fork_exit+0x7b/frame 0xfffffe0067e2ff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0067e2ff30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
System logs showed nothing informative.
Files
Related issues
Updated by Mateusz Guzik about 1 month ago
- Assignee set to Mateusz Guzik
The panicking instruction is:
ffffffff80eacb66: 48 8b 48 30 movq 0x30(%rax), %rcx
According to register dump in the panic, rax is NULL.
addr2line on the rip shows:
static inline bool
bpf_chkdir(struct bpf_d *d, struct mbuf *m)
{
return (d->bd_bif->bif_methods->bif_chkdir(d->bd_bif->bif_softc, m,
d->bd_direction));
}
Upstream has the following commit not present in 26.03:
commit 5937e1cdc99180b4adae2cf20cabd75dd9f45546
Author: Gleb Smirnoff <glebius@FreeBSD.org>
Date: Wed Feb 4 14:07:11 2026 -0800
bpf: don't clear pointer from descriptor to the tap on descriptor close
During packet processing the descriptor is looked up using epoch(9) and it
can be accessed after bpf_detachd(). In scenario of descriptor close the
tap point is alive (it actually produces packets) and thus the pointer can
be legitimately dereferenced. This fixes a race on a bpf(4) device close
that would otherwise result in panic.
Differential Revision: https://reviews.freebsd.org/D55064
diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index 9f0b57728e88..228ac9867bd7 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -678,8 +678,8 @@ bpf_detachd(struct bpf_d *d, bool detached_ifp)
BPFD_LOCK(d);
CK_LIST_REMOVE(d, bd_next);
writer = (d->bd_writer > 0);
- d->bd_bif = NULL;
if (detached_ifp) {
+ d->bd_bif = NULL;
/*
* Notify descriptor as it's detached, so that any
* sleepers wake up and get ENXIO.
Looks like a fix to this very problem.
Updated by Craig Coonrad 9 days ago
Looks to be two cases this week.
6100 - HS 45137024843
8300 - HS 45133424386
Updated by Lev Prokofev 9 days ago
Hm, could it be related to https://redmine.pfsense.org/issues/16828 ?
Updated by Mateusz Guzik 8 days ago
I checked one of the textdumps in 16828, it is indeed the same problem. I think we can close that one as a duplicate.
Updated by Jim Pingle 2 days ago
- Target version set to 26.03.1
Fix for this was picked back to 26.03.1 branch
Updated by Jim Pingle 2 days ago
- Subject changed from Possible regression on 26.03 which causes firewalls to crash to Kernel panic due to race condition on a ``bpf`` device
Updated by Jim Pingle about 7 hours ago
- Project changed from pfSense Plus to pfSense
- Category changed from Operating System to Operating System
- Target version changed from 26.03.1 to 2.9.0
- Plus Target Version set to 26.03.1
Updated by Jim Pingle about 7 hours ago
- Has duplicate Bug #16828: Kernel panic (page fault) in ``bpfmtap`` via ``vlantransmit`` with Suricata BPF listeners active on VLAN interfaces added