Project

General

Profile

Actions

Bug #14497

closed

Kernel panic when using traffic shaping on a PPPoE interface

Added by Steve Wheeler 11 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
Traffic Shaper (ALTQ)
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
23.09
Release Notes:
Default
Affected Version:
2.7.x
Affected Architecture:
amd64

Description

A PRIQ traffic shaper with codel enabled can cause a panicwhen applied to a PPPoE WAN.

See: https://forum.netgate.com/topic/179546/6100-boot-loop-w-traffic-shaper-on-pppoe-wan

Bootstrapping clock...
codel_should_drop: could not found the packet mtag!

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 04
fault virtual address    = 0x5010410
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80cd789d
stack pointer            = 0x0:0xfffffe00c4c04ae0
frame pointer            = 0x0:0xfffffe00c4c04b60
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 10
fault virtual address    = 0x1
fault code        = supervisor read instruction, page not present
instruction pointer    = 0x20:0x1
stack pointer            = 0x28:0xfffffe00107e04e8
frame pointer            = 0x28:0xfffffe00107e0510
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 0 (if_io_tqg_2)
rdi: fffff80265c8b600 rsi: fffff80265c8b600 rdx:                0
rcx: fffff80265573430  r8:                0  r9: fffff80001004c00
rax: fffff80265c8b600 rbx: fffff80265573400 rbp: fffffe00107e0510
r10:                0 r11:         7ffa26e3 r12: fffffe0085822000
r13:          3010000 r14: fffff80265573430 r15: fffff80265c8b600
trap number        = 12
panic: page fault
cpuid = 2
time = 1687374345
KDB: enter: panic
[ thread pid 0 tid 100009 ]
Stopped at      kdb_enter+0x32: movq    $0,0x2342e13(%rip)
db:0:kdb.enter.default> textdump set
textdump set
db:0:kdb.enter.default>  capture on
db:0:kdb.enter.default>  run pfs
db:1:pfs> bt
Tracing pid 0 tid 100009 td 0xfffffe0011ff5c80
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00107e02a0
vpanic() at vpanic+0x183/frame 0xfffffe00107e02f0
panic() at panic+0x43/frame 0xfffffe00107e0350
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00107e03b0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00107e0410
calltrap() at calltrap+0x8/frame 0xfffffe00107e0410
--- trap 0xc, rip = 0x1, rsp = 0xfffffe00107e04e8, rbp = 0xfffffe00107e0510 ---
??() at 0x1/frame 0xfffffe00107e0510
uma_zfree_arg() at uma_zfree_arg+0x1d9/frame 0xfffffe00107e0550
mb_free_ext() at mb_free_ext+0xbe/frame 0xfffffe00107e0580
m_free() at m_free+0xbc/frame 0xfffffe00107e05b0
m_freem() at m_freem+0x28/frame 0xfffffe00107e05d0
iflib_completed_tx_reclaim() at iflib_completed_tx_reclaim+0x1df/frame 0xfffffe00107e0640
iflib_txq_drain() at iflib_txq_drain+0x68/frame 0xfffffe00107e06c0
drain_ring_lockless() at drain_ring_lockless+0x5d/frame 0xfffffe00107e0710
ifmp_ring_enqueue() at ifmp_ring_enqueue+0x27a/frame 0xfffffe00107e0750
iflib_if_transmit() at iflib_if_transmit+0x23b/frame 0xfffffe00107e07c0
iflib_altq_if_start() at iflib_altq_if_start+0xee/frame 0xfffffe00107e07f0
iflib_altq_if_transmit() at iflib_altq_if_transmit+0xff/frame 0xfffffe00107e0820
vlan_transmit() at vlan_transmit+0xdf/frame 0xfffffe00107e0870
ether_output_frame() at ether_output_frame+0xa3/frame 0xfffffe00107e08a0
ether_output() at ether_output+0x670/frame 0xfffffe00107e0930
ip_tryforward() at ip_tryforward+0x577/frame 0xfffffe00107e09d0
ip_input() at ip_input+0x309/frame 0xfffffe00107e0a30
netisr_dispatch_src() at netisr_dispatch_src+0x2a0/frame 0xfffffe00107e0a80
ether_demux() at ether_demux+0x149/frame 0xfffffe00107e0ab0
ether_nh_input() at ether_nh_input+0x352/frame 0xfffffe00107e0b10
netisr_dispatch_src() at netisr_dispatch_src+0xb0/frame 0xfffffe00107e0b60
ether_input() at ether_input+0x69/frame 0xfffffe00107e0bc0
ether_demux() at ether_demux+0x9a/frame 0xfffffe00107e0bf0
ether_nh_input() at ether_nh_input+0x352/frame 0xfffffe00107e0c50
netisr_dispatch_src() at netisr_dispatch_src+0xb0/frame 0xfffffe00107e0ca0
ether_input() at ether_input+0x69/frame 0xfffffe00107e0d00
iflib_rxeof() at iflib_rxeof+0xc13/frame 0xfffffe00107e0e00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00107e0e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00107e0ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00107e0ef0
fork_exit() at fork_exit+0x7d/frame 0xfffffe00107e0f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00107e0f30
--- trap 0x78c08548, rip = 0xeb8480f79c0, rsp = 0xfc22e8c2940f2024, rbp = 0xc016058b48000000 ---

Seen in 23.01 and 23.05


Files

shaper_config.xml (1.74 KB) shaper_config.xml Steve Wheeler, 06/26/2023 12:20 PM
Actions #1

Updated by Steve Wheeler 11 months ago

Attached shaper config that hots this.

Actions #2

Updated by Kristof Provost 11 months ago

I have no idea what could be going on here.

I'm going to assume that the `codel_should_drop: could not found the packet mtag!` warning is related to the panic, but that warning itself ought to be impossible to hit.
It means that there's an mbuf in the codel queue without an MTAG_CODEL. However, I don't see how that can happen, because any insertion into the codel queue is done through codel_addq(), which will either succeed in attaching the mtag, or will drop the packet.

The codel code is presumably called through the altq_priq code, but there too it doesn't add mbufs in any other way.

I've also been unable to reproduce this problem, which itself is odd, given that the original report states this happens consistently at boot. We may be missing a configuration factor.

Actions #3

Updated by Kristof Provost 11 months ago

I believe I've reproduced the problem. It required using ix3 as LAN interface and ix3.201 as PPPoE for WAN.

With a debug kernel I now see this backtrace:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 0c
instruction pointer     = 0x20:0xffffffff80d8ced0
stack pointer           = 0x28:0xfffffe007c849af0
frame pointer           = 0x28:0xfffffe007c849b10
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 66278 (unbound-anchor)
rdi: deadc0dedeadc0de rsi: fffff800020a2600 rdx: 0000000000000065
rcx: fffff800020a2600  r8: 0000000000000020  r9: 0000000000000004
rax: deadc0dedeadc0de rbx: fffff800031a9b00 rbp: fffffe007c849b10
r10: fffff8000151c738 r11: 0000000000000015 r12: 0000000000000100
r13: fffff800016d9300 r14: fffff800031a9b30 r15: fffff800020a2600
trap number             = 9
panic: general protection fault
cpuid = 0
time = 1687875076
KDB: enter: panic
[ thread pid 66278 tid 100202 ]
Stopped at      kdb_enter+0x32: movq    $0,0x2309603(%rip)
db:0:kdb.enter.default> textdump set
textdump set
db:0:kdb.enter.default>  capture on
db:0:kdb.enter.default>  run pfs
db:1:pfs> bt
Tracing pid 66278 tid 100202 td 0xfffffe007d153560
kdb_enter() at kdb_enter+0x32/frame 0xfffffe007c849910
vpanic() at vpanic+0x181/frame 0xfffffe007c849960
panic() at panic+0x43/frame 0xfffffe007c8499c0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe007c849a20
calltrap() at calltrap+0x8/frame 0xfffffe007c849a20
--- trap 0x9, rip = 0xffffffff80d8ced0, rsp = 0xfffffe007c849af0, rbp = 0xfffffe007c849b10 ---
m_tag_delete_chain() at m_tag_delete_chain+0x40/frame 0xfffffe007c849b10
uma_zfree_arg() at uma_zfree_arg+0xad/frame 0xfffffe007c849b60
m_free() at m_free+0xc3/frame 0xfffffe007c849b90
m_freem() at m_freem+0x28/frame 0xfffffe007c849bb0
iflib_completed_tx_reclaim() at iflib_completed_tx_reclaim+0x216/frame 0xfffffe007c849c30
iflib_txq_drain() at iflib_txq_drain+0x63/frame 0xfffffe007c849cb0
drain_ring_lockless() at drain_ring_lockless+0x9d/frame 0xfffffe007c849d00
ifmp_ring_enqueue() at ifmp_ring_enqueue+0x2aa/frame 0xfffffe007c849d40
iflib_if_transmit() at iflib_if_transmit+0x25e/frame 0xfffffe007c849db0
iflib_altq_if_start() at iflib_altq_if_start+0xde/frame 0xfffffe007c849de0
iflib_altq_if_transmit() at iflib_altq_if_transmit+0xe6/frame 0xfffffe007c849e10
vlan_transmit() at vlan_transmit+0xfc/frame 0xfffffe007c849e60
ether_output_frame() at ether_output_frame+0xa3/frame 0xfffffe007c849e90
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c849f30
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c849f70
ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x5a9/frame 0xfffffe007c84a000
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a0a0
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a0e0
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a180
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a1c0
ng_ppp_link_xmit() at ng_ppp_link_xmit+0x11b/frame 0xfffffe007c84a210
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a2b0
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a2f0
ng_vjc_rcvdata() at ng_vjc_rcvdata+0x177/frame 0xfffffe007c84a340
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a3e0
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a420
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a4c0
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a500
ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a5a0
ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a5e0
ng_iface_send() at ng_iface_send+0xf4/frame 0xfffffe007c84a660
ng_iface_start() at ng_iface_start+0x83/frame 0xfffffe007c84a6b0
if_transmit_default() at if_transmit_default+0x13f/frame 0xfffffe007c84a6f0
ng_iface_output() at ng_iface_output+0x1e0/frame 0xfffffe007c84a730
pf_route() at pf_route+0xba0/frame 0xfffffe007c84a810
pf_test() at pf_test+0xe5f/frame 0xfffffe007c84a9c0
pf_check_out() at pf_check_out+0x22/frame 0xfffffe007c84a9e0
pfil_mbuf_out() at pfil_mbuf_out+0x55/frame 0xfffffe007c84aa20
ip_output() at ip_output+0xcd8/frame 0xfffffe007c84ab10
udp_send() at udp_send+0xa18/frame 0xfffffe007c84ac10
sosend_dgram() at sosend_dgram+0x327/frame 0xfffffe007c84ac70
sousrsend() at sousrsend+0x7e/frame 0xfffffe007c84acd0
kern_sendit() at kern_sendit+0x142/frame 0xfffffe007c84ad60
sendit() at sendit+0xb7/frame 0xfffffe007c84adb0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe007c84ae00
amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe007c84af30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe007c84af30
--- syscall (133, FreeBSD ELF64, sendto), rip = 0x82984af2a, rsp = 0x82054ac38, rbp = 0x82054ac80 ---

Actions #4

Updated by Kristof Provost 10 months ago

A bit more progress. It looks like we're enqueuing the same mbuf twice, so it gets used after it's been freed and that's what's causing the panic.

It looks like these are the relevant backtraces:

  0  63710                 codel_addq:entry m = fffff801659fc500
              kernel`priq_enqueue+0x125
              kernel`if_transmit_default+0xb9
              kernel`ng_iface_output+0x1e0
              kernel`pf_route+0xba0
              kernel`pf_test+0xe5f
              kernel`pf_check_out+0x22
              kernel`pfil_mbuf_out+0x55
              kernel`ip_output+0xcd8
              kernel`rip_send+0x460
              kernel`sosend_generic+0x61a
              kernel`sousrsend+0x7e
              kernel`kern_sendit+0x142
              kernel`sendit+0xb7
              kernel`sys_sendto+0x4d
              kernel`amd64_syscall+0x140
              kernel`0xffffffff81292edb

  0  63710                 codel_addq:entry m = fffff801659fc500
              kernel`priq_enqueue+0x125
              kernel`iflib_altq_if_transmit+0xc0
              kernel`vlan_transmit+0xfc
              kernel`ether_output_frame+0xa3
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_pppoe_rcvdata+0x5a9
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_ppp_link_xmit+0x11b
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_vjc_rcvdata+0x177
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_apply_item+0x2de
              kernel`ng_snd_item+0x2e0
              kernel`ng_iface_send+0xf4
              kernel`ng_iface_start+0x83
              kernel`if_transmit_default+0x13f
              kernel`ng_iface_output+0x1e0
              kernel`pf_route+0xba0
              kernel`pf_test+0xe5f
              kernel`pf_check_out+0x22
              kernel`pfil_mbuf_out+0x55
              kernel`ip_output+0xcd8
              kernel`rip_send+0x460
              kernel`sosend_generic+0x61a
              kernel`sousrsend+0x7e
              kernel`kern_sendit+0x142
              kernel`sendit+0xb7
              kernel`sys_sendto+0x4d
              kernel`amd64_syscall+0x140
              kernel`0xffffffff81292edb

I'm looking at ng_iface_start()'s use of IFQ_DRV_DEQUEUE(), because it looks like that may be doing the wrong thing here.

Actions #5

Updated by Kristof Provost 10 months ago

I believe I've identified the cause. It's not quite what I thought initially, but it's close.
We are indeed enqueuing the same mbuf twice, but between the first and second enqueue it does get removed from the queue again, so there's no use-after-free there.

However, the codel code adds an mtag with a timestamp (in codel_addq()):

int
codel_addq(struct codel *c, class_queue_t *q, struct mbuf *m)
{
        struct m_tag *mtag;
        uint64_t *enqueue_time;

        if (qlen(q) < qlimit(q)) {
                mtag = m_tag_locate(m, MTAG_CODEL, 0, NULL);
                if (mtag == NULL)
                        mtag = m_tag_alloc(MTAG_CODEL, 0, sizeof(uint64_t),
                            M_NOWAIT);
                if (mtag == NULL) {
                        m_freem(m);
                        return (-1);
                }
                enqueue_time = (uint64_t *)(mtag + 1);
                *enqueue_time = read_machclk();
                m_tag_prepend(m, mtag);
                _addq(q, m);
                return (0);
        }
        c->drop_overlimit++;
        m_freem(m);

        return (-1);
}

codel_addq() attempts to avoid allocating a second mtag if an MTAG_CODEL tag already exists. That's fine, but it does insert the mtag unconditionally, so if the mbuf already contains an MTAG_CODEL we insert the mtag again. When we get to m_tag_delete_chain() (via m_free()) we delete the same mtag twice, and panic.
I believe the critical elements to seeing this issue are: (1) using ALTQ/codel and (2) doing so over vlan, with a configuration that causes the packet to be handled twice by ALTQ.

I'll spend a few minutes trying to build a test case for this to confirm, but I'm pretty confident that moving the m_tag_prepend() in the same if case as m_tag_alloc() will fix the panic.

Actions #7

Updated by Kristof Provost 10 months ago

  • Status changed from New to Feedback

I've cherry-picked the fix to our branches as well. It should be part of future snapshot builds.

There's no real workaround. Either do not run LAN and WAN on the same physical interface or do not use codel. This bug is specific to codel, so other schedulers should work.

Actions #8

Updated by Jim Pingle 10 months ago

  • Target version changed from CE-Next to 2.8.0
Actions #9

Updated by Jim Pingle 9 months ago

  • Subject changed from Traffic shaping on a PPPoE interface can cause a kernel panic to Kernel panic when using traffic shaping on a PPPoE interface

Updating subject for release notes.

Actions #10

Updated by Jim Pingle 7 months ago

  • Status changed from Feedback to Closed

Closing for lack of feedback. If it's still an issue in this release we can reopen and re-target the issue at the next release.

Actions #11

Updated by Jim Pingle 6 months ago

  • Target version changed from 2.8.0 to 2.7.1
Actions

Also available in: Atom PDF