Project

General

Profile

Actions

Regression #11470

closed

Panic when using CBQ traffic shaping

Added by Jim Pingle about 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
Traffic Shaper (ALTQ)
Target version:
Start date:
02/19/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
22.01
Release Notes:
Default
Affected Version:
2.5.0
Affected Architecture:

Description

A couple users have reported a panic when using CBQ traffic shaping. It may also require using CBQ on VLAN interfaces.

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100039 td 0xfffff800053bf000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe000043e610
vpanic() at vpanic+0x197/frame 0xfffffe000043e660
panic() at panic+0x43/frame 0xfffffe000043e6c0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe000043e720
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000043e770
trap() at trap+0x286/frame 0xfffffe000043e880
calltrap() at calltrap+0x8/frame 0xfffffe000043e880
--- trap 0xc, rip = 0xffffffff80ec014e, rsp = 0xfffffe000043e950, rbp = 0xfffffe000043e980 ---
ether_8021q_frame() at ether_8021q_frame+0x2e/frame 0xfffffe000043e980
vlan_transmit() at vlan_transmit+0xc8/frame 0xfffffe000043e9f0
vlan_altq_start() at vlan_altq_start+0xb4/frame 0xfffffe000043ea20
cbqrestart() at cbqrestart+0x64/frame 0xfffffe000043ea50
rmc_restart() at rmc_restart+0x6f/frame 0xfffffe000043ea80
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe000043eb30
softclock() at softclock+0x79/frame 0xfffffe000043eb50
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe000043ebb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000043ebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000043ebf0

Attached is a textdump archive from a separate user with the same backtrace.


Files

1613673139905-textdump.tar (154 KB) 1613673139905-textdump.tar Jim Pingle, 02/19/2021 02:04 PM
textdump.tar.0 (154 KB) textdump.tar.0 Reymond Rivera, 08/20/2021 03:43 PM
info.0 (443 Bytes) info.0 Reymond Rivera, 08/20/2021 03:43 PM
issue-11470-config.xml (22.5 KB) issue-11470-config.xml Max Leighton, 09/08/2021 02:06 PM
Actions #1

Updated by Viktor Gurov about 3 years ago

same issue: #11285

Actions #2

Updated by Jim Pingle about 3 years ago

That doesn't look like the same issue, the backtrace is a quite a bit different despite both mentioning CBQ. They could be related, but they aren't close enough that I'd call them the same yet.

Actions #3

Updated by Jim Pingle almost 3 years ago

  • Plus Target Version set to 21.05

Would be nice to fix soon if we can, but not a blocker at the moment.

Actions #4

Updated by Jim Pingle almost 3 years ago

  • Plus Target Version changed from 21.05 to 21.09
Actions #5

Updated by Steve Wheeler over 2 years ago

If anyone can provide steps to replicate this please do so. It's 'just working' for me locally.

Actions #6

Updated by Reymond Rivera over 2 years ago

I believe I am hitting the same issue. I have included dump files that was generated.

I have enabled CBQ on 7 interface on my pfsense. Prior using CBQ I was using using PRIQ and no issue was encounter. I open forum discussion stephenw10 suggested to remove the last interface that I added in CBQ which I did and the issue stopped. Pfsense is running straight for 5 days now without it crashing.

Actions #7

Updated by Kristof Provost over 2 years ago

  • Status changed from New to Feedback

I've not been able to reproduce this yet. I'd expect it to happen around the borrowing code of CBQ, where it starts or stops borrowing and handles a delayed packet. I'm not entirely clear on when the relevant code gets called (hence the inability to reproduce it so far), but the panic itself looks to be pretty obvious.

From the backtrace and code I'm fairly confident that the problem is that we don't have a vnet context set. We enter the code path through a callout, which won't have vnet context, but then we (potentially) transmit packets, and then die fairly early on in ether_8021q_frame(). One of the first things that function does is to access a vnet-local variable (V_soft_pad), which will then explode. That's a fairly common sort of bug, and happily easily fixed.

I've pushed the fix to devel-12 as 9fa5a825c272d9e60314960829843e9c3456bb67

Actions #8

Updated by Jim Pingle over 2 years ago

  • Target version changed from CE-Next to 2.6.0
Actions #9

Updated by Max Leighton over 2 years ago

Please see the attached sanitized interfaces/shaper config for a 5100 that has this issue which may help in reproducing this if needed.

Actions #10

Updated by Jim Pingle over 2 years ago

  • Plus Target Version changed from 21.09 to 22.01
Actions #11

Updated by Jim Pingle about 2 years ago

  • Status changed from Feedback to Resolved
  • Assignee set to Kristof Provost

No recent reports. Can always reopen it if someone manages to reproduce it again with the current fix in place.

Actions

Also available in: Atom PDF