Project

General

Profile

Bug #11585

WireGuard kernel panic when changing peer port on assigned WireGuard interface

Added by Christian McDonald about 2 months ago. Updated 28 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
WireGuard
Target version:
Start date:
03/01/2021
Due date:
% Done:

0%

Estimated time:
Affected Version:
Affected Architecture:
Release Notes:
Default

Description

All I did was change the port on peer 0.

info.1 (380 Bytes) info.1 Christian McDonald, 03/01/2021 01:59 PM
info.0 (381 Bytes) info.0 Christian McDonald, 03/01/2021 01:59 PM
textdump.tar.0 (154 KB) textdump.tar.0 Christian McDonald, 03/01/2021 01:59 PM
textdump.tar.1 (154 KB) textdump.tar.1 Christian McDonald, 03/01/2021 01:59 PM

History

#1 Updated by Christian McDonald about 2 months ago

Also hitting this when changing the port on the local wg interface...sometimes. Sometimes changing the port is fine, other times it completely crashes the system.

#2 Updated by Jim Pingle about 2 months ago

  • Subject changed from Kernel panic when changing WG peer port on assigned wireguard interface. to WireGuard kernel panic when changing peer port on assigned WireGuard interface
  • Target version set to CE-Next

That does appear to be one we haven't seen yet:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address    = 0x410
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80d7e767
stack pointer            = 0x0:0xfffffe00401d7fb0
frame pointer            = 0x0:0xfffffe00401d8030
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 0 (if_io_tqg_3)
trap number        = 12
panic: page fault
cpuid = 3
time = 1614628307
KDB: enter: panic
db:0:kdb.enter.default>  bt
Tracing pid 0 tid 100033 td 0xfffff80005399000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00401d7c70
vpanic() at vpanic+0x197/frame 0xfffffe00401d7cc0
panic() at panic+0x43/frame 0xfffffe00401d7d20
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00401d7d80
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00401d7dd0
trap() at trap+0x286/frame 0xfffffe00401d7ee0
calltrap() at calltrap+0x8/frame 0xfffffe00401d7ee0
--- trap 0xc, rip = 0xffffffff80d7e767, rsp = 0xfffffe00401d7fb0, rbp = 0xfffffe00401d8030 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xd7/frame 0xfffffe00401d8030
wg_queue_out() at wg_queue_out+0x21b/frame 0xfffffe00401d8070
wg_transmit() at wg_transmit+0xda/frame 0xfffffe00401d80d0
pf_test() at pf_test+0x22f0/frame 0xfffffe00401d8310
pf_test() at pf_test+0x20f6/frame 0xfffffe00401d8550
pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00401d8570
pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00401d8610
ip_tryforward() at ip_tryforward+0x193/frame 0xfffffe00401d8690
ip_input() at ip_input+0x3fe/frame 0xfffffe00401d8740
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00401d8790
ether_demux() at ether_demux+0x16a/frame 0xfffffe00401d87c0
ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe00401d8820
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00401d8870
ether_input() at ether_input+0x4b/frame 0xfffffe00401d88a0
vlan_input() at vlan_input+0x1f3/frame 0xfffffe00401d88f0
ether_demux() at ether_demux+0x153/frame 0xfffffe00401d8920
ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe00401d8980
netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00401d89d0
ether_input() at ether_input+0x4b/frame 0xfffffe00401d8a00
iflib_rxeof() at iflib_rxeof+0xae6/frame 0xfffffe00401d8ae0
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00401d8b20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe00401d8b80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe00401d8bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00401d8bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00401d8bf0

#3 Updated by Jim Pingle about 2 months ago

  • Status changed from New to Closed

#4 Updated by Jim Pingle about 2 months ago

  • Status changed from Closed to New

#5 Updated by Jim Pingle about 2 months ago

Parts of the backtrace are similar to #11586 but it's not an exact match.

#6 Updated by Christian McDonald about 2 months ago

Interestingly enough, I haven't had any panics on my cloud instances hosted on Vultr, though my instances hosted on-premise on ESXi v7u1 is pretty consistently crashing. fwiw

#7 Updated by Jim Pingle about 1 month ago

  • Assignee set to Peter Grehan

#8 Updated by Jim Pingle about 1 month ago

  • Target version changed from CE-Next to 2.5.1

#9 Updated by Renato Botelho about 1 month ago

  • Status changed from New to Feedback

Many wg fixes were cherry-picked from upstream. This must be tested again

#10 Updated by Christian McDonald about 1 month ago

I can test whenever this hits the dev snaps. I assume this is incubating in 2.6 devl?

I'm not sure what you can discuss at this point, but can you elaborate on the upstream codebase? Are these fixes cherry-picked from the most recent work by Jason et al. or prior to their submission?

#11 Updated by Jim Pingle 28 days ago

  • Target version changed from 2.5.1 to Future

Also available in: Atom PDF