Project

General

Profile

Actions

Regression #14164

closed

IPv6 interface configuration race condition can lead to kernel panic

Added by Steve Wheeler about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Interfaces
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
23.05
Release Notes:
Default
Affected Version:
2.7.0
Affected Architecture:

Description

While re-configuring an interface that has an IPv6 config, such as when the link bounces, it's possible to hit a race condition triggering a kernel panic:

db:1:pfs> bt
Tracing pid 4585 tid 100445 td 0xfffffe00cd4ba1e0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cd68c790
vpanic() at vpanic+0x182/frame 0xfffffe00cd68c7e0
panic() at panic+0x43/frame 0xfffffe00cd68c840
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cd68c8a0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cd68c900
calltrap() at calltrap+0x8/frame 0xfffffe00cd68c900
--- trap 0xc, rip = 0xffffffff80fd9293, rsp = 0xfffffe00cd68c9d0, rbp = 0xfffffe00cd68ca20 ---
in6_unlink_ifa() at in6_unlink_ifa+0x63/frame 0xfffffe00cd68ca20
in6_purgeaddr() at in6_purgeaddr+0x367/frame 0xfffffe00cd68cb40
in6_purgeifaddr() at in6_purgeifaddr+0x13/frame 0xfffffe00cd68cb60
in6_control() at in6_control+0x532/frame 0xfffffe00cd68cbc0
ifioctl() at ifioctl+0x7bc/frame 0xfffffe00cd68ccc0
kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe00cd68cd30
sys_ioctl() at sys_ioctl+0x101/frame 0xfffffe00cd68ce00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00cd68cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cd68cf30
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x18f47cd96e4a, rsp = 0x18f478021f28, rbp = 0x18f478021f70 ---

Tested in 23.01 amd64.

Actions #1

Updated by Steve Wheeler about 1 year ago

  • Priority changed from Normal to High
Actions #2

Updated by Christian McDonald about 1 year ago

  • Assignee set to Christian McDonald
#!/bin/sh

IFNAME=igc0
INET6ADDR=2001:db8:bdbd::123/48

while true; do
    ifconfig $IFNAME inet6 $INET6ADDR &                      
    ifconfig $IFNAME inet6 $INET6ADDR delete &                   
done

Here is a simple reproducer

Actions #3

Updated by Jim Pingle about 1 year ago

  • Subject changed from IPv6 Interface config race condition to IPv6 interface configuration race condition can lead to kernel panic
Actions #4

Updated by Mateusz Guzik about 1 year ago

Posted a review upstream: https://reviews.freebsd.org/D39317

Actions #5

Updated by Mateusz Guzik about 1 year ago

  • Status changed from New to In Progress
  • Assignee changed from Christian McDonald to Mateusz Guzik
Actions #6

Updated by Mateusz Guzik about 1 year ago

  • Status changed from In Progress to Closed

Fix landed upstream and locally after the merge

Actions #7

Updated by Jim Pingle about 1 year ago

  • Status changed from Closed to Feedback
  • % Done changed from 0 to 100

Let's keep this in a feedback state for a bit so we can confirm it's fixed in snapshots.

Actions #8

Updated by Jim Pingle 12 months ago

  • Status changed from Feedback to Resolved

No subsequent reports of this that I'm aware of.

Actions #9

Updated by Rob A 11 months ago

Failure condition is still present on 23.05 Release.

Re-configuring an interface, ISP induced WAN link down/up or simply a manual disconnection / connection of the WAN link via Status / Interfaces can still trigger this fault. A manual test of 7 WAN down & up commands from GUI produced 4 router crashes and 3 successful reconnections. Full fault logs and crash reports available.

See forum post:

https://forum.netgate.com/topic/178971/netgate-6100-crash-on-interface-change-not-resolved-ipv6-pppoe/19?_=1685208890891

Backtrace of one of the crashes:

db:1:pfs> bt
Tracing pid 93402 tid 103857 td 0xfffffe00cf7cac80
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cf8a0800
vpanic() at vpanic+0x183/frame 0xfffffe00cf8a0850
panic() at panic+0x43/frame 0xfffffe00cf8a08b0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cf8a0910
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cf8a0970
calltrap() at calltrap+0x8/frame 0xfffffe00cf8a0970
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cf8a0a40, rbp = 0xfffffe00cf8a0a70 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cf8a0a70
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cf8a0c60
tcp_output() at tcp_output+0x14/frame 0xfffffe00cf8a0c80
tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cf8a0d10
soconnectat() at soconnectat+0x9e/frame 0xfffffe00cf8a0d60
kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cf8a0dc0
sys_connect() at sys_connect+0x75/frame 0xfffffe00cf8a0e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cf8a0f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cf8a0f30
--- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdf5f8c98, rbp = 0x7fffdf5f8cd0 ---
db:1:pfs>
Actions #10

Updated by Steve Wheeler 11 months ago

  • Status changed from Resolved to Incomplete
  • Affected Version set to 2.7.0

It can also show as:

db:1:pfs> bt
Tracing pid 68614 tid 100330 td 0xfffffe00cf325720
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c7d955f0
vpanic() at vpanic+0x183/frame 0xfffffe00c7d95640
panic() at panic+0x43/frame 0xfffffe00c7d956a0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c7d95700
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c7d95760
calltrap() at calltrap+0x8/frame 0xfffffe00c7d95760
--- trap 0xc, rip = 0xffffffff80f63aa4, rsp = 0xfffffe00c7d95830, rbp = 0xfffffe00c7d95a50 ---
ip6_output() at ip6_output+0xb74/frame 0xfffffe00c7d95a50
udp6_send() at udp6_send+0x78e/frame 0xfffffe00c7d95c10
sosend_dgram() at sosend_dgram+0x357/frame 0xfffffe00c7d95c70
sousrsend() at sousrsend+0x5f/frame 0xfffffe00c7d95cd0
kern_sendit() at kern_sendit+0x132/frame 0xfffffe00c7d95d60
sendit() at sendit+0xb7/frame 0xfffffe00c7d95db0
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00c7d95e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c7d95f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c7d95f30
--- syscall (133, FreeBSD ELF64, sendto), rip = 0x823f95f2a, rsp = 0x8202cea88, rbp = 0x8202cead0 ---

Actions #11

Updated by Kristof Provost 11 months ago

I've not yet been able to reproduce this, but it looks like the issue in comment 9 and 10 is that we're trying to send IPv6 traffic on an interface where it's (now) disabled.

In 9 we may be dereferencing ND_IFINFO(ifp), which is done through ifp->if_afdata[AF_INET6], which may be NULL on interfaces without IPv6.

In 10 it appears to be hitting in6_ifstat_inc(), which dereferences ifp->if_afdata[AF_INET6], which may be NULL on interfaces without IPv6.

Note that is a different bug from the original report (where simultaneous IPv6 configuration changes interfered with each other).

Actions #12

Updated by Rob A 11 months ago

Two more backtraces, should they offer any more insight:

db:1:pfs> bt
Tracing pid 3281 tid 100913 td 0xfffffe00cfe3e3a0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cfdc4800
vpanic() at vpanic+0x183/frame 0xfffffe00cfdc4850
panic() at panic+0x43/frame 0xfffffe00cfdc48b0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cfdc4910
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cfdc4970
calltrap() at calltrap+0x8/frame 0xfffffe00cfdc4970
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cfdc4a40, rbp = 0xfffffe00cfdc4a70 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cfdc4a70
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cfdc4c60
tcp_output() at tcp_output+0x14/frame 0xfffffe00cfdc4c80
tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cfdc4d10
soconnectat() at soconnectat+0x9e/frame 0xfffffe00cfdc4d60
kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cfdc4dc0
sys_connect() at sys_connect+0x75/frame 0xfffffe00cfdc4e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cfdc4f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cfdc4f30
--- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdfbfbc98, rbp = 0x7fffdfbfbcd0 ---
db:1:pfs>
db:1:pfs> bt
Tracing pid 2 tid 100041 td 0xfffffe0085264560
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850ad910
vpanic() at vpanic+0x183/frame 0xfffffe00850ad960
panic() at panic+0x43/frame 0xfffffe00850ad9c0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00850ada20
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ada80
calltrap() at calltrap+0x8/frame 0xfffffe00850ada80
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00850adb50, rbp = 0xfffffe00850adb80 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850adb80
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00850add70
tcp_timer_rexmt() at tcp_timer_rexmt+0x514/frame 0xfffffe00850addd0
tcp_timer_enter() at tcp_timer_enter+0x102/frame 0xfffffe00850ade10
softclock_call_cc() at softclock_call_cc+0x13c/frame 0xfffffe00850adec0
softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850adef0
fork_exit() at fork_exit+0x7d/frame 0xfffffe00850adf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850adf30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:1:pfs>

I have complete logs for 4 events, if needed.

☕️

Actions #13

Updated by Mateusz Guzik 11 months ago

As Kristof said this is a different bug in ipv6 handling.

As such please open a new redmine with the new traces (+ assign to me) and close this one.

Actions #14

Updated by Steve Wheeler 11 months ago

  • Status changed from Incomplete to Resolved
Actions

Also available in: Atom PDF