Panic when using CBQ traffic shaping
A couple users have reported a panic when using CBQ traffic shaping. It may also require using CBQ on VLAN interfaces.
db:0:kdb.enter.default> bt Tracing pid 12 tid 100039 td 0xfffff800053bf000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe000043e610 vpanic() at vpanic+0x197/frame 0xfffffe000043e660 panic() at panic+0x43/frame 0xfffffe000043e6c0 trap_fatal() at trap_fatal+0x391/frame 0xfffffe000043e720 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000043e770 trap() at trap+0x286/frame 0xfffffe000043e880 calltrap() at calltrap+0x8/frame 0xfffffe000043e880 --- trap 0xc, rip = 0xffffffff80ec014e, rsp = 0xfffffe000043e950, rbp = 0xfffffe000043e980 --- ether_8021q_frame() at ether_8021q_frame+0x2e/frame 0xfffffe000043e980 vlan_transmit() at vlan_transmit+0xc8/frame 0xfffffe000043e9f0 vlan_altq_start() at vlan_altq_start+0xb4/frame 0xfffffe000043ea20 cbqrestart() at cbqrestart+0x64/frame 0xfffffe000043ea50 rmc_restart() at rmc_restart+0x6f/frame 0xfffffe000043ea80 softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe000043eb30 softclock() at softclock+0x79/frame 0xfffffe000043eb50 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe000043ebb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe000043ebf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000043ebf0
Attached is a textdump archive from a separate user with the same backtrace.
Updated by Reymond Rivera about 2 months ago
I believe I am hitting the same issue. I have included dump files that was generated.
I have enabled CBQ on 7 interface on my pfsense. Prior using CBQ I was using using PRIQ and no issue was encounter. I open forum discussion stephenw10 suggested to remove the last interface that I added in CBQ which I did and the issue stopped. Pfsense is running straight for 5 days now without it crashing.
Updated by Kristof Provost about 2 months ago
- Status changed from New to Feedback
I've not been able to reproduce this yet. I'd expect it to happen around the borrowing code of CBQ, where it starts or stops borrowing and handles a delayed packet. I'm not entirely clear on when the relevant code gets called (hence the inability to reproduce it so far), but the panic itself looks to be pretty obvious.
From the backtrace and code I'm fairly confident that the problem is that we don't have a vnet context set. We enter the code path through a callout, which won't have vnet context, but then we (potentially) transmit packets, and then die fairly early on in ether_8021q_frame(). One of the first things that function does is to access a vnet-local variable (V_soft_pad), which will then explode. That's a fairly common sort of bug, and happily easily fixed.
I've pushed the fix to devel-12 as 9fa5a825c272d9e60314960829843e9c3456bb67