Project

General

Profile

Actions

Bug #15503

open

udp6_bind kernel panic

Added by Steve Wheeler 16 days ago. Updated 16 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Operating System
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.7.2
Affected Architecture:

Description

We have seen a few reports of kernel panics with services attempting to listen for requests on link-local IPv6 addresses. For example:

db:0:kdb.enter.default>  bt
Tracing pid 54720 tid 100695 td 0xfffffe00da57fe40
kdb_enter() at kdb_enter+0x32/frame 0xfffffe0112beb960
vpanic() at vpanic+0x163/frame 0xfffffe0112beba90
panic() at panic+0x43/frame 0xfffffe0112bebaf0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0112bebb50
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0112bebbb0
calltrap() at calltrap+0x8/frame 0xfffffe0112bebbb0
--- trap 0xc, rip = 0xffffffff80f44220, rsp = 0xfffffe0112bebc80, rbp = 0xfffffe0112bebd00 ---
in6_pcbbind() at in6_pcbbind+0x360/frame 0xfffffe0112bebd00
udp6_bind() at udp6_bind+0x13c/frame 0xfffffe0112bebd60
sobind() at sobind+0x32/frame 0xfffffe0112bebd80
kern_bindat() at kern_bindat+0x96/frame 0xfffffe0112bebdc0
sys_bind() at sys_bind+0x9b/frame 0xfffffe0112bebe00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0112bebf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0112bebf30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x82839fcea, rsp = 0x82d9cfaf8, rbp = 0x82d9cfbc0 ---

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address    = 0xb8
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80f44220
stack pointer            = 0x28:0xfffffe0112bebc80
frame pointer            = 0x28:0xfffffe0112bebd00
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 54720 (isc-net-0003)
rdi: ffffffff82d62a40 rsi: ffffffff82d62a40 rdx: 0000000000010200
rcx: 0000000000000000  r8: fffff8025bcdb700  r9: 0000000000000000
rax: fffff8000c663540 rbx: fffff8000da57c40 rbp: fffffe0112bebd00
r10: 0000000000000000 r11: fffffe00da580360 r12: fffff8000800cc60
r13: 000000000000ecf6 r14: 000000000000ecf6 r15: fffff8025bcdb700
trap number        = 12
panic: page fault
cpuid = 3
time = 1715038044
KDB: enter: panic

In each case this appears to be a package attempting to listen. So far we have confirmed Bind and Tailscale can hit this.

In Bind it can be worked around by unselecting link-local IPv6 addresses is listening interfaces. Tailscale always listens on all interfaces though.

It happens very rarely though so it seems likely this requires some other setting in combination to trigger.

Actions #1

Updated by Kristof Provost 16 days ago

I took a very quick look. The faulting code in6_pcbbind+0x360 translates to /var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/sys/netinet6/in6_pcb.c:257

That is this code fragment:

 249   │                 t = in6_pcblookup_local(pcbinfo,
 250   │                     &sin6->sin6_addr, lport,
 251   │                     INPLOOKUP_WILDCARD, cred);
 252   │                 if (t != NULL &&
 253   │                     (so->so_type != SOCK_STREAM ||
 254   │                      IN6_IS_ADDR_UNSPECIFIED(&t->in6p_faddr)) &&
 255   │                     (!IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr) ||
 256   │                      !IN6_IS_ADDR_UNSPECIFIED(&t->in6p_laddr) ||
 257   │                      (t->inp_socket->so_options & SO_REUSEPORT) ||
 258   │                      (t->inp_socket->so_options & SO_REUSEPORT_LB) == 0) &&
 259   │                     (inp->inp_cred->cr_uid !=
 260   │                      t->inp_cred->cr_uid))
 261   │                     return (EADDRINUSE);

The faulting address is: fault virtual address = 0xb8, so we're presumably dereferencing a NULL pointer plus a small (184 byte) offset.
Given that offset and the NULL check for 't' I believe that t->inp_socket is NULL here. That should only happen when the inpcb gets freed, after it's already been removed from the list (so can't be found in the lookup any more).

So my current theory is that we're removing the inpcb (via sofree()) while running the bind.j

Actions

Also available in: Atom PDF