Regression #11470
closed
Panic when using CBQ traffic shaping
Added by Jim Pingle almost 4 years ago.
Updated almost 3 years ago.
Category:
Traffic Shaper (ALTQ)
Plus Target Version:
22.01
Description
A couple users have reported a panic when using CBQ traffic shaping. It may also require using CBQ on VLAN interfaces.
db:0:kdb.enter.default> bt
Tracing pid 12 tid 100039 td 0xfffff800053bf000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe000043e610
vpanic() at vpanic+0x197/frame 0xfffffe000043e660
panic() at panic+0x43/frame 0xfffffe000043e6c0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe000043e720
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000043e770
trap() at trap+0x286/frame 0xfffffe000043e880
calltrap() at calltrap+0x8/frame 0xfffffe000043e880
--- trap 0xc, rip = 0xffffffff80ec014e, rsp = 0xfffffe000043e950, rbp = 0xfffffe000043e980 ---
ether_8021q_frame() at ether_8021q_frame+0x2e/frame 0xfffffe000043e980
vlan_transmit() at vlan_transmit+0xc8/frame 0xfffffe000043e9f0
vlan_altq_start() at vlan_altq_start+0xb4/frame 0xfffffe000043ea20
cbqrestart() at cbqrestart+0x64/frame 0xfffffe000043ea50
rmc_restart() at rmc_restart+0x6f/frame 0xfffffe000043ea80
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe000043eb30
softclock() at softclock+0x79/frame 0xfffffe000043eb50
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe000043ebb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000043ebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000043ebf0
Attached is a textdump archive from a separate user with the same backtrace.
Files
That doesn't look like the same issue, the backtrace is a quite a bit different despite both mentioning CBQ. They could be related, but they aren't close enough that I'd call them the same yet.
- Plus Target Version set to 21.05
Would be nice to fix soon if we can, but not a blocker at the moment.
- Plus Target Version changed from 21.05 to 21.09
If anyone can provide steps to replicate this please do so. It's 'just working' for me locally.
I believe I am hitting the same issue. I have included dump files that was generated.
I have enabled CBQ on 7 interface on my pfsense. Prior using CBQ I was using using PRIQ and no issue was encounter. I open forum discussion stephenw10 suggested to remove the last interface that I added in CBQ which I did and the issue stopped. Pfsense is running straight for 5 days now without it crashing.
- Status changed from New to Feedback
I've not been able to reproduce this yet. I'd expect it to happen around the borrowing code of CBQ, where it starts or stops borrowing and handles a delayed packet. I'm not entirely clear on when the relevant code gets called (hence the inability to reproduce it so far), but the panic itself looks to be pretty obvious.
From the backtrace and code I'm fairly confident that the problem is that we don't have a vnet context set. We enter the code path through a callout, which won't have vnet context, but then we (potentially) transmit packets, and then die fairly early on in ether_8021q_frame(). One of the first things that function does is to access a vnet-local variable (V_soft_pad), which will then explode. That's a fairly common sort of bug, and happily easily fixed.
I've pushed the fix to devel-12 as 9fa5a825c272d9e60314960829843e9c3456bb67
- Target version changed from CE-Next to 2.6.0
Please see the attached sanitized interfaces/shaper config for a 5100 that has this issue which may help in reproducing this if needed.
- Plus Target Version changed from 21.09 to 22.01
- Status changed from Feedback to Resolved
- Assignee set to Kristof Provost
No recent reports. Can always reopen it if someone manages to reproduce it again with the current fix in place.
Also available in: Atom
PDF