Project

General

Profile

Actions

Bug #16738

open

Kernel Panic: spin lock held too long: LAN link event triggers unnecessary LAGG/VLAN rebuild

Added by Marek Hajduczenia 7 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Interfaces
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.8.1
Affected Architecture:
amd64

Description

Summary

A kernel panic occurs when connecting a device to the LAN interface (igc0). The link event on igc0 triggers a full LAGG bond teardown and rebuild on unrelated interfaces (ixl0/ixl1), which causes LACP flapping and all VLAN interfaces to go down. During this process, the coretemp driver attempts an SMP rendezvous to read CPU temperature via MSR, but a CPU is blocked handling the interface storm, causing a spin lock timeout and kernel panic.

Environment

Component Detail
pfSense Version 2.8.1-RELEASE
Kernel FreeBSD 15.0-CURRENT #21 RELENG_2_8_1-n256095-47c932dcc0e9 (Aug 28, 2025)
CPU Intel Core i7-10710U (6 cores / 12 threads)
RAM 32 GB
NIC (WAN) igc5 (Intel i225/i226)
NIC (LAN) igc0 (Intel i225/i226)
NIC (LAGG) ixl0 + ixl1 (Intel X710/XL710 10GbE) in LACP bond as lagg0
VLANs on lagg0 10, 11, 150, 151, 153, 154, 1666

Steps to Reproduce

  1. Configure a system with igc0 as LAN, ixl0 + ixl1 as LACP bond (lagg0), multiple VLANs on lagg0, and coretemp thermal sensor enabled (default)
  2. System is running normally with all interfaces up
  3. Connect a laptop (or any device) to the igc0 (LAN) port, causing a link state change
  4. System panics

Expected Behavior

A link event on igc0 (LAN) should only trigger reconfiguration of igc0 itself. It should not affect the LAGG bond (lagg0) or any VLAN interfaces on lagg0, as they are completely independent.

Actual Behavior

The igc0 link event triggers a full reconfiguration that:

  1. Tears down the LAGG bond (removes ixl0 and ixl1 members)
  2. Destroys and recreates all VLAN interfaces on lagg0
  3. Causes LACP flapping on the bond
  4. All VLAN interfaces go down and back up
  5. During this process, a coretemp sysctl read triggers an SMP rendezvous that cannot complete because CPUs are busy handling the interface storm
  6. Kernel panics with "spin lock held too long"

Panic Details

panic: spin lock held too long
cpuid = 7
time = 1773101390

Stack Trace

kdb_enter() at kdb_enter+0x33
panic() at panic+0x43
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x64
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd4
smp_rendezvous_cpus() at smp_rendezvous_cpus+0x1e0
x86_msr_op() at x86_msr_op+0x175
coretemp_get_val_sysctl() at coretemp_get_val_sysctl+0x4b
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x91
sysctl_root() at sysctl_root+0x20d
userland_sysctl() at userland_sysctl+0x15e
sys___sysctl() at sys___sysctl+0x65
amd64_syscall() at amd64_syscall+0x115
fast_syscall_common() at fast_syscall_common+0xf8
--- syscall (202, FreeBSD ELF64, __sysctl) ---

Kernel Message Buffer (relevant section)

ixl0: link state changed to UP
lagg0: link state changed to UP
lagg0.153: link state changed to UP
lagg0.154: link state changed to UP
lagg0.10: link state changed to UP
lagg0.11: link state changed to UP
lagg0.1666: link state changed to UP
lagg0.151: link state changed to UP
lagg0.150: link state changed to UP
igc0: link state changed to DOWN         <-- laptop connected to LAN port
igc0: link state changed to UP
igc0: link state changed to DOWN
lagg0: link state changed to DOWN         <-- WHY? igc0 is not part of lagg0
lagg0.153: link state changed to DOWN
lagg0.154: link state changed to DOWN
lagg0.10: link state changed to DOWN
lagg0.11: link state changed to DOWN
lagg0.1666: link state changed to DOWN
lagg0.151: link state changed to DOWN
lagg0.150: link state changed to DOWN
ixl0: link state changed to DOWN
lagg0: IPv6 addresses on ixl0 have been removed before adding it as a member to prevent IPv6 address scope violation.
lagg0: link state changed to DOWN
ixl1: link state changed to DOWN
lagg0: IPv6 addresses on ixl1 have been removed before adding it as a member to prevent IPv6 address scope violation.
vlan2: changing name to 'lagg0.10'        <-- VLANs rebuilt from scratch
vlan3: changing name to 'lagg0.11'
vlan4: changing name to 'lagg0.150'
vlan1: changing name to 'lagg0.151'
vlan6: changing name to 'lagg0.153'
vlan0: changing name to 'lagg0.154'
vlan5: changing name to 'lagg0.1666'
[... promiscuous mode toggling on multiple interfaces ...]
ixl1: Interface stopped DISTRIBUTING, possible flapping
ixl0: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN         <-- second DOWN from LACP flap
lagg0.153: link state changed to DOWN
lagg0.154: link state changed to DOWN
lagg0.10: link state changed to DOWN
lagg0.11: link state changed to DOWN
lagg0.1666: link state changed to DOWN
lagg0.151: link state changed to DOWN
lagg0.150: link state changed to DOWN
spin lock 0xffffffff82ad1ae0 (smp rendezvous) held by 0xfffff80085cdf000 (tid 100517) too long
panic: spin lock held too long

Two Separate Issues

Issue 1 (Primary): Link event on igc0 triggers LAGG/VLAN rebuild on unrelated interfaces

The rc.linkup handler (or whatever processes the igc0 link event) appears to reconfigure all interfaces instead of just the one that changed. This causes:

  • Unnecessary LAGG bond teardown/rebuild
  • All VLANs destroyed and recreated
  • LACP flapping
  • Complete network outage on all VLANs for the duration

This should be scoped to only reconfigure the interface that had the link event.

Issue 2 (Secondary): coretemp + SMP rendezvous panic under interface load

The coretemp driver uses smp_rendezvous_cpus() to read MSR registers, which requires all CPUs to synchronize. When CPUs are busy handling a burst of interface events (as caused by Issue 1), the spin lock times out and causes a kernel panic.

This is a known class of FreeBSD bugs with coretemp under heavy interrupt/softirq load, but the trigger here is entirely avoidable if Issue 1 is fixed.

Workaround

Disable the coretemp driver to prevent the panic (does not fix the unnecessary interface rebuild):

echo 'coretemp_load="NO"' >> /boot/loader.conf.local

Or via the GUI: System > Advanced > Miscellaneous > Thermal Sensors > None

Attachments

The full textdump (panic.txt, ddb.txt, msgbuf.txt, config.txt, version.txt) is attached.


Files

textdump.tar.0 (88.5 KB) textdump.tar.0 Marek Hajduczenia, 03/10/2026 12:32 AM

No data to display

Actions

Also available in: Atom PDF