Bug #16738
openKernel Panic: spin lock held too long: LAN link event triggers unnecessary LAGG/VLAN rebuild
0%
Description
Summary¶
A kernel panic occurs when connecting a device to the LAN interface (igc0). The link event on igc0 triggers a full LAGG bond teardown and rebuild on unrelated interfaces (ixl0/ixl1), which causes LACP flapping and all VLAN interfaces to go down. During this process, the coretemp driver attempts an SMP rendezvous to read CPU temperature via MSR, but a CPU is blocked handling the interface storm, causing a spin lock timeout and kernel panic.
Environment¶
| Component | Detail |
|---|---|
| pfSense Version | 2.8.1-RELEASE |
| Kernel | FreeBSD 15.0-CURRENT #21 RELENG_2_8_1-n256095-47c932dcc0e9 (Aug 28, 2025) |
| CPU | Intel Core i7-10710U (6 cores / 12 threads) |
| RAM | 32 GB |
| NIC (WAN) | igc5 (Intel i225/i226) |
| NIC (LAN) | igc0 (Intel i225/i226) |
| NIC (LAGG) | ixl0 + ixl1 (Intel X710/XL710 10GbE) in LACP bond as lagg0 |
| VLANs on lagg0 | 10, 11, 150, 151, 153, 154, 1666 |
Steps to Reproduce¶
- Configure a system with igc0 as LAN, ixl0 + ixl1 as LACP bond (lagg0), multiple VLANs on lagg0, and coretemp thermal sensor enabled (default)
- System is running normally with all interfaces up
- Connect a laptop (or any device) to the igc0 (LAN) port, causing a link state change
- System panics
Expected Behavior¶
A link event on igc0 (LAN) should only trigger reconfiguration of igc0 itself. It should not affect the LAGG bond (lagg0) or any VLAN interfaces on lagg0, as they are completely independent.
Actual Behavior¶
The igc0 link event triggers a full reconfiguration that:
- Tears down the LAGG bond (removes ixl0 and ixl1 members)
- Destroys and recreates all VLAN interfaces on lagg0
- Causes LACP flapping on the bond
- All VLAN interfaces go down and back up
- During this process, a coretemp sysctl read triggers an SMP rendezvous that cannot complete because CPUs are busy handling the interface storm
- Kernel panics with "spin lock held too long"
Panic Details¶
panic: spin lock held too long cpuid = 7 time = 1773101390
Stack Trace¶
kdb_enter() at kdb_enter+0x33 panic() at panic+0x43 _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x64 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd4 smp_rendezvous_cpus() at smp_rendezvous_cpus+0x1e0 x86_msr_op() at x86_msr_op+0x175 coretemp_get_val_sysctl() at coretemp_get_val_sysctl+0x4b sysctl_root_handler_locked() at sysctl_root_handler_locked+0x91 sysctl_root() at sysctl_root+0x20d userland_sysctl() at userland_sysctl+0x15e sys___sysctl() at sys___sysctl+0x65 amd64_syscall() at amd64_syscall+0x115 fast_syscall_common() at fast_syscall_common+0xf8 --- syscall (202, FreeBSD ELF64, __sysctl) ---
Kernel Message Buffer (relevant section)¶
ixl0: link state changed to UP lagg0: link state changed to UP lagg0.153: link state changed to UP lagg0.154: link state changed to UP lagg0.10: link state changed to UP lagg0.11: link state changed to UP lagg0.1666: link state changed to UP lagg0.151: link state changed to UP lagg0.150: link state changed to UP igc0: link state changed to DOWN <-- laptop connected to LAN port igc0: link state changed to UP igc0: link state changed to DOWN lagg0: link state changed to DOWN <-- WHY? igc0 is not part of lagg0 lagg0.153: link state changed to DOWN lagg0.154: link state changed to DOWN lagg0.10: link state changed to DOWN lagg0.11: link state changed to DOWN lagg0.1666: link state changed to DOWN lagg0.151: link state changed to DOWN lagg0.150: link state changed to DOWN ixl0: link state changed to DOWN lagg0: IPv6 addresses on ixl0 have been removed before adding it as a member to prevent IPv6 address scope violation. lagg0: link state changed to DOWN ixl1: link state changed to DOWN lagg0: IPv6 addresses on ixl1 have been removed before adding it as a member to prevent IPv6 address scope violation. vlan2: changing name to 'lagg0.10' <-- VLANs rebuilt from scratch vlan3: changing name to 'lagg0.11' vlan4: changing name to 'lagg0.150' vlan1: changing name to 'lagg0.151' vlan6: changing name to 'lagg0.153' vlan0: changing name to 'lagg0.154' vlan5: changing name to 'lagg0.1666' [... promiscuous mode toggling on multiple interfaces ...] ixl1: Interface stopped DISTRIBUTING, possible flapping ixl0: Interface stopped DISTRIBUTING, possible flapping lagg0: link state changed to DOWN <-- second DOWN from LACP flap lagg0.153: link state changed to DOWN lagg0.154: link state changed to DOWN lagg0.10: link state changed to DOWN lagg0.11: link state changed to DOWN lagg0.1666: link state changed to DOWN lagg0.151: link state changed to DOWN lagg0.150: link state changed to DOWN spin lock 0xffffffff82ad1ae0 (smp rendezvous) held by 0xfffff80085cdf000 (tid 100517) too long panic: spin lock held too long
Two Separate Issues¶
Issue 1 (Primary): Link event on igc0 triggers LAGG/VLAN rebuild on unrelated interfaces¶
The rc.linkup handler (or whatever processes the igc0 link event) appears to reconfigure all interfaces instead of just the one that changed. This causes:
- Unnecessary LAGG bond teardown/rebuild
- All VLANs destroyed and recreated
- LACP flapping
- Complete network outage on all VLANs for the duration
This should be scoped to only reconfigure the interface that had the link event.
Issue 2 (Secondary): coretemp + SMP rendezvous panic under interface load¶
The coretemp driver uses smp_rendezvous_cpus() to read MSR registers, which requires all CPUs to synchronize. When CPUs are busy handling a burst of interface events (as caused by Issue 1), the spin lock times out and causes a kernel panic.
This is a known class of FreeBSD bugs with coretemp under heavy interrupt/softirq load, but the trigger here is entirely avoidable if Issue 1 is fixed.
Workaround¶
Disable the coretemp driver to prevent the panic (does not fix the unnecessary interface rebuild):
echo 'coretemp_load="NO"' >> /boot/loader.conf.local
Or via the GUI: System > Advanced > Miscellaneous > Thermal Sensors > None
Attachments¶
The full textdump (panic.txt, ddb.txt, msgbuf.txt, config.txt, version.txt) is attached.
Files
No data to display