Project

General

Profile

Actions

Regression #16790

open

Kernel panic due to race condition on a ``bpf`` device

Added by Azamat Khakimyanov about 1 month ago. Updated about 7 hours ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Operating System
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
26.03.1
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

There are already two customers who reported about crashes which happened after update up to 26.03 pfSense Plus.
It happened on two different hardware: 7100 (HS #44225707405) and 4100 (HS #44386928982).

Crash report showed:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 18
fault virtual address = 0x30
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80eb2b66
stack pointer        = 0x0:0xfffffe0067e2f7c0
frame pointer        = 0x0:0xfffffe0067e2f830
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_1)
rdi: fffffe0008760a00 rsi: fffffe0067e2f7e0 rdx: 0000000000000001
rcx: 00000000000042a7  r8: 0000000000000002  r9: fffff80149a99360
rax: 0000000000000000 rbx: fffff80149a99300 rbp: fffffe0067e2f830
r10: fffffe0067e2f6f0 r11: 0000000000000001 r12: 0000000000000060
r13: fffff80008751800 r14: 0000000000000000 r15: fffff8017099e800
trap number = 12
panic: page fault
cpuid = 1
time = 1776065740
KDB: enter: panic

db:1:pfs> bt
Tracing pid 0 tid 100008 td 0xfffff80100d0b000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0067e2f640
panic() at panic+0x43/frame 0xfffffe0067e2f6a0
trap_pfault() at trap_pfault+0x3cf/frame 0xfffffe0067e2f6f0
calltrap() at calltrap+0x8/frame 0xfffffe0067e2f6f0
--- trap 0xc, rip = 0xffffffff80eb2b66, rsp = 0xfffffe0067e2f7c0, rbp = 0xfffffe0067e2f830 ---
bpf_mtap() at bpf_mtap+0x86/frame 0xfffffe0067e2f830
vlan_transmit() at vlan_transmit+0x42/frame 0xfffffe0067e2f880
ether_output_frame() at ether_output_frame+0xd3/frame 0xfffffe0067e2f8b0
ether_output() at ether_output+0x697/frame 0xfffffe0067e2f940
ip_tryforward() at ip_tryforward+0x4fc/frame 0xfffffe0067e2f9e0
ip_input() at ip_input+0x31d/frame 0xfffffe0067e2fa40
netisr_dispatch_src() at netisr_dispatch_src+0x1fa/frame 0xfffffe0067e2fa90
ether_demux() at ether_demux+0x194/frame 0xfffffe0067e2fac0
ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0067e2fb20
netisr_dispatch_src() at netisr_dispatch_src+0x9f/frame 0xfffffe0067e2fb70
ether_input() at ether_input+0x56/frame 0xfffffe0067e2fbc0
ether_demux() at ether_demux+0x8e/frame 0xfffffe0067e2fbf0
ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0067e2fc50
netisr_dispatch_src() at netisr_dispatch_src+0x9f/frame 0xfffffe0067e2fca0
ether_input() at ether_input+0x56/frame 0xfffffe0067e2fcf0
iflib_rxeof() at iflib_rxeof+0xa6f/frame 0xfffffe0067e2fe00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0067e2fe40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe0067e2fec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe0067e2fef0
fork_exit() at fork_exit+0x7b/frame 0xfffffe0067e2ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0067e2ff30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

System logs showed nothing informative.


Files

4100_textdump.tar.0 (247 KB) 4100_textdump.tar.0 Azamat Khakimyanov, 04/14/2026 09:34 AM
7100_textdump.tar.0 (302 KB) 7100_textdump.tar.0 Azamat Khakimyanov, 04/14/2026 09:35 AM
7100_status_output.tgz (831 KB) 7100_status_output.tgz Azamat Khakimyanov, 04/14/2026 09:36 AM

Related issues

Has duplicate Bug #16828: Kernel panic (page fault) in ``bpfmtap`` via ``vlantransmit`` with Suricata BPF listeners active on VLAN interfacesDuplicateMateusz Guzik

Actions
Actions #1

Updated by Mateusz Guzik about 1 month ago

  • Assignee set to Mateusz Guzik

The panicking instruction is:

ffffffff80eacb66: 48 8b 48 30           movq    0x30(%rax), %rcx

According to register dump in the panic, rax is NULL.

addr2line on the rip shows:

static inline bool
bpf_chkdir(struct bpf_d *d, struct mbuf *m)
{
        return (d->bd_bif->bif_methods->bif_chkdir(d->bd_bif->bif_softc, m,
            d->bd_direction));
}

Upstream has the following commit not present in 26.03:

commit 5937e1cdc99180b4adae2cf20cabd75dd9f45546
Author: Gleb Smirnoff <glebius@FreeBSD.org>
Date:   Wed Feb 4 14:07:11 2026 -0800

    bpf: don't clear pointer from descriptor to the tap on descriptor close

    During packet processing the descriptor is looked up using epoch(9) and it
    can be accessed after bpf_detachd().  In scenario of descriptor close the
    tap point is alive (it actually produces packets) and thus the pointer can
    be legitimately dereferenced.  This fixes a race on a bpf(4) device close
    that would otherwise result in panic.

    Differential Revision:  https://reviews.freebsd.org/D55064

diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index 9f0b57728e88..228ac9867bd7 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -678,8 +678,8 @@ bpf_detachd(struct bpf_d *d, bool detached_ifp)
        BPFD_LOCK(d);
        CK_LIST_REMOVE(d, bd_next);
        writer = (d->bd_writer > 0);
-       d->bd_bif = NULL;
        if (detached_ifp) {
+               d->bd_bif = NULL;
                /*
                 * Notify descriptor as it's detached, so that any
                 * sleepers wake up and get ENXIO.

Looks like a fix to this very problem.

Actions #2

Updated by Mateusz Guzik 22 days ago

  • Status changed from New to Feedback
Actions #3

Updated by Craig Coonrad 9 days ago

Looks to be two cases this week.
6100 - HS 45137024843
8300 - HS 45133424386

Actions #4

Updated by Lev Prokofev 9 days ago

Hm, could it be related to https://redmine.pfsense.org/issues/16828 ?

Actions #5

Updated by Mateusz Guzik 8 days ago

I checked one of the textdumps in 16828, it is indeed the same problem. I think we can close that one as a duplicate.

Actions #6

Updated by Danilo Zrenjanin 8 days ago

Another case.
5100 - HS 45223009297

Actions #7

Updated by Lev Prokofev 3 days ago

One more 45279747897 - Whitebox

Actions #8

Updated by Jim Pingle 2 days ago

  • Target version set to 26.03.1

Fix for this was picked back to 26.03.1 branch

Actions #9

Updated by Jim Pingle 2 days ago

  • Subject changed from Possible regression on 26.03 which causes firewalls to crash to Kernel panic due to race condition on a ``bpf`` device
Actions #10

Updated by Jim Pingle about 7 hours ago

  • Project changed from pfSense Plus to pfSense
  • Category changed from Operating System to Operating System
  • Target version changed from 26.03.1 to 2.9.0
  • Plus Target Version set to 26.03.1
Actions #11

Updated by Jim Pingle about 7 hours ago

  • Private changed from Yes to No
Actions #12

Updated by Jim Pingle about 7 hours ago

  • Has duplicate Bug #16828: Kernel panic (page fault) in ``bpfmtap`` via ``vlantransmit`` with Suricata BPF listeners active on VLAN interfaces added
Actions

Also available in: Atom PDF