Bug #14497
closedKernel panic when using traffic shaping on a PPPoE interface
0%
Description
A PRIQ traffic shaper with codel enabled can cause a panicwhen applied to a PPPoE WAN.
See: https://forum.netgate.com/topic/179546/6100-boot-loop-w-traffic-shaper-on-pppoe-wan
Bootstrapping clock... codel_should_drop: could not found the packet mtag! Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 04 fault virtual address = 0x5010410 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cd789d stack pointer = 0x0:0xfffffe00c4c04ae0 frame pointer = 0x0:0xfffffe00c4c04b60 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 10 fault virtual address = 0x1 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x1 stack pointer = 0x28:0xfffffe00107e04e8 frame pointer = 0x28:0xfffffe00107e0510 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_2) rdi: fffff80265c8b600 rsi: fffff80265c8b600 rdx: 0 rcx: fffff80265573430 r8: 0 r9: fffff80001004c00 rax: fffff80265c8b600 rbx: fffff80265573400 rbp: fffffe00107e0510 r10: 0 r11: 7ffa26e3 r12: fffffe0085822000 r13: 3010000 r14: fffff80265573430 r15: fffff80265c8b600 trap number = 12 panic: page fault cpuid = 2 time = 1687374345 KDB: enter: panic [ thread pid 0 tid 100009 ] Stopped at kdb_enter+0x32: movq $0,0x2342e13(%rip) db:0:kdb.enter.default> textdump set textdump set db:0:kdb.enter.default> capture on db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 0 tid 100009 td 0xfffffe0011ff5c80 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00107e02a0 vpanic() at vpanic+0x183/frame 0xfffffe00107e02f0 panic() at panic+0x43/frame 0xfffffe00107e0350 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00107e03b0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00107e0410 calltrap() at calltrap+0x8/frame 0xfffffe00107e0410 --- trap 0xc, rip = 0x1, rsp = 0xfffffe00107e04e8, rbp = 0xfffffe00107e0510 --- ??() at 0x1/frame 0xfffffe00107e0510 uma_zfree_arg() at uma_zfree_arg+0x1d9/frame 0xfffffe00107e0550 mb_free_ext() at mb_free_ext+0xbe/frame 0xfffffe00107e0580 m_free() at m_free+0xbc/frame 0xfffffe00107e05b0 m_freem() at m_freem+0x28/frame 0xfffffe00107e05d0 iflib_completed_tx_reclaim() at iflib_completed_tx_reclaim+0x1df/frame 0xfffffe00107e0640 iflib_txq_drain() at iflib_txq_drain+0x68/frame 0xfffffe00107e06c0 drain_ring_lockless() at drain_ring_lockless+0x5d/frame 0xfffffe00107e0710 ifmp_ring_enqueue() at ifmp_ring_enqueue+0x27a/frame 0xfffffe00107e0750 iflib_if_transmit() at iflib_if_transmit+0x23b/frame 0xfffffe00107e07c0 iflib_altq_if_start() at iflib_altq_if_start+0xee/frame 0xfffffe00107e07f0 iflib_altq_if_transmit() at iflib_altq_if_transmit+0xff/frame 0xfffffe00107e0820 vlan_transmit() at vlan_transmit+0xdf/frame 0xfffffe00107e0870 ether_output_frame() at ether_output_frame+0xa3/frame 0xfffffe00107e08a0 ether_output() at ether_output+0x670/frame 0xfffffe00107e0930 ip_tryforward() at ip_tryforward+0x577/frame 0xfffffe00107e09d0 ip_input() at ip_input+0x309/frame 0xfffffe00107e0a30 netisr_dispatch_src() at netisr_dispatch_src+0x2a0/frame 0xfffffe00107e0a80 ether_demux() at ether_demux+0x149/frame 0xfffffe00107e0ab0 ether_nh_input() at ether_nh_input+0x352/frame 0xfffffe00107e0b10 netisr_dispatch_src() at netisr_dispatch_src+0xb0/frame 0xfffffe00107e0b60 ether_input() at ether_input+0x69/frame 0xfffffe00107e0bc0 ether_demux() at ether_demux+0x9a/frame 0xfffffe00107e0bf0 ether_nh_input() at ether_nh_input+0x352/frame 0xfffffe00107e0c50 netisr_dispatch_src() at netisr_dispatch_src+0xb0/frame 0xfffffe00107e0ca0 ether_input() at ether_input+0x69/frame 0xfffffe00107e0d00 iflib_rxeof() at iflib_rxeof+0xc13/frame 0xfffffe00107e0e00 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00107e0e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00107e0ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00107e0ef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe00107e0f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00107e0f30 --- trap 0x78c08548, rip = 0xeb8480f79c0, rsp = 0xfc22e8c2940f2024, rbp = 0xc016058b48000000 ---
Seen in 23.01 and 23.05
Files
Updated by Steve Wheeler over 1 year ago
- File shaper_config.xml shaper_config.xml added
Attached shaper config that hots this.
Updated by Kristof Provost over 1 year ago
I have no idea what could be going on here.
I'm going to assume that the `codel_should_drop: could not found the packet mtag!` warning is related to the panic, but that warning itself ought to be impossible to hit.
It means that there's an mbuf in the codel queue without an MTAG_CODEL. However, I don't see how that can happen, because any insertion into the codel queue is done through codel_addq(), which will either succeed in attaching the mtag, or will drop the packet.
The codel code is presumably called through the altq_priq code, but there too it doesn't add mbufs in any other way.
I've also been unable to reproduce this problem, which itself is odd, given that the original report states this happens consistently at boot. We may be missing a configuration factor.
Updated by Kristof Provost over 1 year ago
I believe I've reproduced the problem. It required using ix3 as LAN interface and ix3.201 as PPPoE for WAN.
With a debug kernel I now see this backtrace:
Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 0c instruction pointer = 0x20:0xffffffff80d8ced0 stack pointer = 0x28:0xfffffe007c849af0 frame pointer = 0x28:0xfffffe007c849b10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 66278 (unbound-anchor) rdi: deadc0dedeadc0de rsi: fffff800020a2600 rdx: 0000000000000065 rcx: fffff800020a2600 r8: 0000000000000020 r9: 0000000000000004 rax: deadc0dedeadc0de rbx: fffff800031a9b00 rbp: fffffe007c849b10 r10: fffff8000151c738 r11: 0000000000000015 r12: 0000000000000100 r13: fffff800016d9300 r14: fffff800031a9b30 r15: fffff800020a2600 trap number = 9 panic: general protection fault cpuid = 0 time = 1687875076 KDB: enter: panic [ thread pid 66278 tid 100202 ] Stopped at kdb_enter+0x32: movq $0,0x2309603(%rip) db:0:kdb.enter.default> textdump set textdump set db:0:kdb.enter.default> capture on db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 66278 tid 100202 td 0xfffffe007d153560 kdb_enter() at kdb_enter+0x32/frame 0xfffffe007c849910 vpanic() at vpanic+0x181/frame 0xfffffe007c849960 panic() at panic+0x43/frame 0xfffffe007c8499c0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe007c849a20 calltrap() at calltrap+0x8/frame 0xfffffe007c849a20 --- trap 0x9, rip = 0xffffffff80d8ced0, rsp = 0xfffffe007c849af0, rbp = 0xfffffe007c849b10 --- m_tag_delete_chain() at m_tag_delete_chain+0x40/frame 0xfffffe007c849b10 uma_zfree_arg() at uma_zfree_arg+0xad/frame 0xfffffe007c849b60 m_free() at m_free+0xc3/frame 0xfffffe007c849b90 m_freem() at m_freem+0x28/frame 0xfffffe007c849bb0 iflib_completed_tx_reclaim() at iflib_completed_tx_reclaim+0x216/frame 0xfffffe007c849c30 iflib_txq_drain() at iflib_txq_drain+0x63/frame 0xfffffe007c849cb0 drain_ring_lockless() at drain_ring_lockless+0x9d/frame 0xfffffe007c849d00 ifmp_ring_enqueue() at ifmp_ring_enqueue+0x2aa/frame 0xfffffe007c849d40 iflib_if_transmit() at iflib_if_transmit+0x25e/frame 0xfffffe007c849db0 iflib_altq_if_start() at iflib_altq_if_start+0xde/frame 0xfffffe007c849de0 iflib_altq_if_transmit() at iflib_altq_if_transmit+0xe6/frame 0xfffffe007c849e10 vlan_transmit() at vlan_transmit+0xfc/frame 0xfffffe007c849e60 ether_output_frame() at ether_output_frame+0xa3/frame 0xfffffe007c849e90 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c849f30 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c849f70 ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x5a9/frame 0xfffffe007c84a000 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a0a0 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a0e0 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a180 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a1c0 ng_ppp_link_xmit() at ng_ppp_link_xmit+0x11b/frame 0xfffffe007c84a210 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a2b0 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a2f0 ng_vjc_rcvdata() at ng_vjc_rcvdata+0x177/frame 0xfffffe007c84a340 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a3e0 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a420 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a4c0 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a500 ng_apply_item() at ng_apply_item+0x2de/frame 0xfffffe007c84a5a0 ng_snd_item() at ng_snd_item+0x2e0/frame 0xfffffe007c84a5e0 ng_iface_send() at ng_iface_send+0xf4/frame 0xfffffe007c84a660 ng_iface_start() at ng_iface_start+0x83/frame 0xfffffe007c84a6b0 if_transmit_default() at if_transmit_default+0x13f/frame 0xfffffe007c84a6f0 ng_iface_output() at ng_iface_output+0x1e0/frame 0xfffffe007c84a730 pf_route() at pf_route+0xba0/frame 0xfffffe007c84a810 pf_test() at pf_test+0xe5f/frame 0xfffffe007c84a9c0 pf_check_out() at pf_check_out+0x22/frame 0xfffffe007c84a9e0 pfil_mbuf_out() at pfil_mbuf_out+0x55/frame 0xfffffe007c84aa20 ip_output() at ip_output+0xcd8/frame 0xfffffe007c84ab10 udp_send() at udp_send+0xa18/frame 0xfffffe007c84ac10 sosend_dgram() at sosend_dgram+0x327/frame 0xfffffe007c84ac70 sousrsend() at sousrsend+0x7e/frame 0xfffffe007c84acd0 kern_sendit() at kern_sendit+0x142/frame 0xfffffe007c84ad60 sendit() at sendit+0xb7/frame 0xfffffe007c84adb0 sys_sendto() at sys_sendto+0x4d/frame 0xfffffe007c84ae00 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe007c84af30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe007c84af30 --- syscall (133, FreeBSD ELF64, sendto), rip = 0x82984af2a, rsp = 0x82054ac38, rbp = 0x82054ac80 ---
Updated by Kristof Provost over 1 year ago
A bit more progress. It looks like we're enqueuing the same mbuf twice, so it gets used after it's been freed and that's what's causing the panic.
It looks like these are the relevant backtraces:
0 63710 codel_addq:entry m = fffff801659fc500 kernel`priq_enqueue+0x125 kernel`if_transmit_default+0xb9 kernel`ng_iface_output+0x1e0 kernel`pf_route+0xba0 kernel`pf_test+0xe5f kernel`pf_check_out+0x22 kernel`pfil_mbuf_out+0x55 kernel`ip_output+0xcd8 kernel`rip_send+0x460 kernel`sosend_generic+0x61a kernel`sousrsend+0x7e kernel`kern_sendit+0x142 kernel`sendit+0xb7 kernel`sys_sendto+0x4d kernel`amd64_syscall+0x140 kernel`0xffffffff81292edb 0 63710 codel_addq:entry m = fffff801659fc500 kernel`priq_enqueue+0x125 kernel`iflib_altq_if_transmit+0xc0 kernel`vlan_transmit+0xfc kernel`ether_output_frame+0xa3 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_pppoe_rcvdata+0x5a9 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_ppp_link_xmit+0x11b kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_vjc_rcvdata+0x177 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_apply_item+0x2de kernel`ng_snd_item+0x2e0 kernel`ng_iface_send+0xf4 kernel`ng_iface_start+0x83 kernel`if_transmit_default+0x13f kernel`ng_iface_output+0x1e0 kernel`pf_route+0xba0 kernel`pf_test+0xe5f kernel`pf_check_out+0x22 kernel`pfil_mbuf_out+0x55 kernel`ip_output+0xcd8 kernel`rip_send+0x460 kernel`sosend_generic+0x61a kernel`sousrsend+0x7e kernel`kern_sendit+0x142 kernel`sendit+0xb7 kernel`sys_sendto+0x4d kernel`amd64_syscall+0x140 kernel`0xffffffff81292edb
I'm looking at ng_iface_start()'s use of IFQ_DRV_DEQUEUE(), because it looks like that may be doing the wrong thing here.
Updated by Kristof Provost over 1 year ago
I believe I've identified the cause. It's not quite what I thought initially, but it's close.
We are indeed enqueuing the same mbuf twice, but between the first and second enqueue it does get removed from the queue again, so there's no use-after-free there.
However, the codel code adds an mtag with a timestamp (in codel_addq()):
int codel_addq(struct codel *c, class_queue_t *q, struct mbuf *m) { struct m_tag *mtag; uint64_t *enqueue_time; if (qlen(q) < qlimit(q)) { mtag = m_tag_locate(m, MTAG_CODEL, 0, NULL); if (mtag == NULL) mtag = m_tag_alloc(MTAG_CODEL, 0, sizeof(uint64_t), M_NOWAIT); if (mtag == NULL) { m_freem(m); return (-1); } enqueue_time = (uint64_t *)(mtag + 1); *enqueue_time = read_machclk(); m_tag_prepend(m, mtag); _addq(q, m); return (0); } c->drop_overlimit++; m_freem(m); return (-1); }
codel_addq() attempts to avoid allocating a second mtag if an MTAG_CODEL tag already exists. That's fine, but it does insert the mtag unconditionally, so if the mbuf already contains an MTAG_CODEL we insert the mtag again. When we get to m_tag_delete_chain() (via m_free()) we delete the same mtag twice, and panic.
I believe the critical elements to seeing this issue are: (1) using ALTQ/codel and (2) doing so over vlan, with a configuration that causes the packet to be handled twice by ALTQ.
I'll spend a few minutes trying to build a test case for this to confirm, but I'm pretty confident that moving the m_tag_prepend() in the same if case as m_tag_alloc() will fix the panic.
Updated by Kristof Provost over 1 year ago
Updated by Kristof Provost over 1 year ago
- Status changed from New to Feedback
I've cherry-picked the fix to our branches as well. It should be part of future snapshot builds.
There's no real workaround. Either do not run LAN and WAN on the same physical interface or do not use codel. This bug is specific to codel, so other schedulers should work.
Updated by Jim Pingle over 1 year ago
- Target version changed from CE-Next to 2.8.0
Updated by Jim Pingle about 1 year ago
- Subject changed from Traffic shaping on a PPPoE interface can cause a kernel panic to Kernel panic when using traffic shaping on a PPPoE interface
Updating subject for release notes.
Updated by Jim Pingle 12 months ago
- Status changed from Feedback to Closed
Closing for lack of feedback. If it's still an issue in this release we can reopen and re-target the issue at the next release.