Project

General

Profile

Actions

Bug #12079

open

IGMPProxy: kernel panic, Sleeping thread owns a non-sleepable lock

Added by Steve Wheeler 3 months ago. Updated 16 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
IGMP Proxy
Target version:
Start date:
06/25/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
22.01
Release Notes:
Default
Affected Version:
Affected Architecture:
All

Description

IGMPProxy can trigger a kernel panic in 2.5.2-RC.

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu = 0xfffffe007dbe5380
curthread    = 0xfffff8005ca13000: pid 289 tid 100198 "igmpproxy" 
curpcb       = 0xfffff8005ca135a0
fpcurthread  = 0xfffff8005ca13000: pid 289 "igmpproxy" 
idlethread   = 0xfffff80004185740: tid 100004 "idle: cpu1" 
curpmap      = 0xfffff80077342138
tssp         = 0xffffffff83717688
commontssp   = 0xffffffff83717688
rsp0         = 0xfffffe001e3b0dc0
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff8371dea0
ldt          = 0xffffffff8371dee0
tss          = 0xffffffff8371ded0
tlb gen      = 40718
curvnet      = 0xfffff8000406db40
db:0:kdb.enter.default>  bt
Tracing pid 289 tid 100198 td 0xfffff8005ca13000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe001e3b0820
vpanic() at vpanic+0x197/frame 0xfffffe001e3b0870
panic() at panic+0x43/frame 0xfffffe001e3b08d0
propagate_priority() at propagate_priority+0x282/frame 0xfffffe001e3b0900
turnstile_wait() at turnstile_wait+0x30c/frame 0xfffffe001e3b0950
__mtx_lock_sleep() at __mtx_lock_sleep+0x199/frame 0xfffffe001e3b09e0
X_ip_mrouter_set() at X_ip_mrouter_set+0x13a4/frame 0xfffffe001e3b0ab0
rip_ctloutput() at rip_ctloutput+0xf3/frame 0xfffffe001e3b0ae0
sosetopt() at sosetopt+0xe7/frame 0xfffffe001e3b0b40
kern_setsockopt() at kern_setsockopt+0xb0/frame 0xfffffe001e3b0ba0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe001e3b0bc0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe001e3b0cf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe001e3b0cf0
--- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x8003b57ea, rsp = 0x7fffffffeba8, rbp

Booting with a debug kernel shows:

lock order reversal: (sleepable after non-sleepable)
 1st 0xffffffff83795300 IPv4 multicast interfaces (IPv4 multicast interfaces) @ /usr/home/mjg/git/netgate/FreeBSD-src/sys/netinet/ip_mroute.c:845
 2nd 0xfffff80004445178 iflib ctx lock (iflib ctx lock) @ /usr/home/mjg/git/netgate/FreeBSD-src/sys/net/iflib.c:4190
stack backtrace:
#0 0xffffffff80dd7021 at witness_debugger+0x71
#1 0xffffffff80d77387 at _sx_xlock+0x67
#2 0xffffffff80eac72f at iflib_if_ioctl+0x2df
#3 0xffffffff80e81107 at if_setflag+0xd7
#4 0xffffffff80f4cf02 at X_ip_mrouter_set+0x1642
#5 0xffffffff80f55783 at rip_ctloutput+0xf3
#6 0xffffffff80e0a75f at sosetopt+0xff
#7 0xffffffff80e0fa50 at kern_setsockopt+0xb0
#8 0xffffffff80e0f994 at sys_setsockopt+0x24
#9 0xffffffff8134a32e at amd64_syscall+0x2be
#10 0xffffffff81320c7e at fast_syscall_common+0xf8

Tested:

2.5.2-RC (amd64)
built on Fri Jun 25 03:01:13 EDT 2021
FreeBSD 12.2-STABLE

Actions #1

Updated by Steve Wheeler 3 months ago

  • Assignee set to Mateusz Guzik
Actions #2

Updated by Mateusz Guzik 3 months ago

First a note that to my understanding the bug is not easy to run into. However, booting a kernel with debug options easily reproduces the warning that the bug exists.

I think the most sensible thing for now is to put the bug on a back burner (reasons below).

The code got rewritten upstream in https://cgit.freebsd.org/src/commit/?id=d40cd26a86a79342d175296b74768dd7183fc02b . The rewrite replaced the lock at hand with a read-write lock which suffers the same problem, so far I don't know how feasible it is to fix.

Thus in order to fix it in pfSense the following has to be performed:
- fix the issue in the rewrite and backport both -- this is not really feasible in my opinion
- fix the code as found in pfSense -- given the impending rebase to new FreeBSD this would be writing code to be thrown away soon and rebase would be an immediate regression

Consequently I think the best course of action is to wait for the rebase.

Actions #3

Updated by Jim Pingle 3 months ago

  • Target version changed from 2.5.2 to 2.6.0
  • Plus Target Version set to 21.09

Re-targeting this to 2.6.0/21.09

Actions #4

Updated by Jim Pingle 16 days ago

  • Plus Target Version changed from 21.09 to 22.01

Per Mateusz, this is still unresolved upstream in FreeBSD, even on HEAD. Moving target ahead.

Actions

Also available in: Atom PDF