Regression #14164
closedIPv6 interface configuration race condition can lead to kernel panic
100%
Description
While re-configuring an interface that has an IPv6 config, such as when the link bounces, it's possible to hit a race condition triggering a kernel panic:
db:1:pfs> bt Tracing pid 4585 tid 100445 td 0xfffffe00cd4ba1e0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cd68c790 vpanic() at vpanic+0x182/frame 0xfffffe00cd68c7e0 panic() at panic+0x43/frame 0xfffffe00cd68c840 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cd68c8a0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cd68c900 calltrap() at calltrap+0x8/frame 0xfffffe00cd68c900 --- trap 0xc, rip = 0xffffffff80fd9293, rsp = 0xfffffe00cd68c9d0, rbp = 0xfffffe00cd68ca20 --- in6_unlink_ifa() at in6_unlink_ifa+0x63/frame 0xfffffe00cd68ca20 in6_purgeaddr() at in6_purgeaddr+0x367/frame 0xfffffe00cd68cb40 in6_purgeifaddr() at in6_purgeifaddr+0x13/frame 0xfffffe00cd68cb60 in6_control() at in6_control+0x532/frame 0xfffffe00cd68cbc0 ifioctl() at ifioctl+0x7bc/frame 0xfffffe00cd68ccc0 kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe00cd68cd30 sys_ioctl() at sys_ioctl+0x101/frame 0xfffffe00cd68ce00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00cd68cf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cd68cf30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x18f47cd96e4a, rsp = 0x18f478021f28, rbp = 0x18f478021f70 ---
Tested in 23.01 amd64.
Updated by Christian McDonald over 1 year ago
- Assignee set to Christian McDonald
#!/bin/sh
IFNAME=igc0
INET6ADDR=2001:db8:bdbd::123/48
while true; do
ifconfig $IFNAME inet6 $INET6ADDR &
ifconfig $IFNAME inet6 $INET6ADDR delete &
done
Here is a simple reproducer
Updated by Jim Pingle over 1 year ago
- Subject changed from IPv6 Interface config race condition to IPv6 interface configuration race condition can lead to kernel panic
Updated by Mateusz Guzik over 1 year ago
Posted a review upstream: https://reviews.freebsd.org/D39317
Updated by Mateusz Guzik over 1 year ago
- Status changed from New to In Progress
- Assignee changed from Christian McDonald to Mateusz Guzik
Updated by Mateusz Guzik over 1 year ago
- Status changed from In Progress to Closed
Fix landed upstream and locally after the merge
Updated by Jim Pingle over 1 year ago
- Status changed from Closed to Feedback
- % Done changed from 0 to 100
Let's keep this in a feedback state for a bit so we can confirm it's fixed in snapshots.
Updated by Jim Pingle over 1 year ago
- Status changed from Feedback to Resolved
No subsequent reports of this that I'm aware of.
Updated by Rob A over 1 year ago
Failure condition is still present on 23.05 Release.
Re-configuring an interface, ISP induced WAN link down/up or simply a manual disconnection / connection of the WAN link via Status / Interfaces can still trigger this fault. A manual test of 7 WAN down & up commands from GUI produced 4 router crashes and 3 successful reconnections. Full fault logs and crash reports available.
See forum post:
Backtrace of one of the crashes:
db:1:pfs> bt
Tracing pid 93402 tid 103857 td 0xfffffe00cf7cac80
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cf8a0800
vpanic() at vpanic+0x183/frame 0xfffffe00cf8a0850
panic() at panic+0x43/frame 0xfffffe00cf8a08b0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cf8a0910
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cf8a0970
calltrap() at calltrap+0x8/frame 0xfffffe00cf8a0970
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cf8a0a40, rbp = 0xfffffe00cf8a0a70 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cf8a0a70
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cf8a0c60
tcp_output() at tcp_output+0x14/frame 0xfffffe00cf8a0c80
tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cf8a0d10
soconnectat() at soconnectat+0x9e/frame 0xfffffe00cf8a0d60
kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cf8a0dc0
sys_connect() at sys_connect+0x75/frame 0xfffffe00cf8a0e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cf8a0f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cf8a0f30
--- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdf5f8c98, rbp = 0x7fffdf5f8cd0 ---
db:1:pfs>
Updated by Steve Wheeler over 1 year ago
- Status changed from Resolved to Incomplete
- Affected Version set to 2.7.0
It can also show as:
db:1:pfs> bt Tracing pid 68614 tid 100330 td 0xfffffe00cf325720 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c7d955f0 vpanic() at vpanic+0x183/frame 0xfffffe00c7d95640 panic() at panic+0x43/frame 0xfffffe00c7d956a0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c7d95700 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c7d95760 calltrap() at calltrap+0x8/frame 0xfffffe00c7d95760 --- trap 0xc, rip = 0xffffffff80f63aa4, rsp = 0xfffffe00c7d95830, rbp = 0xfffffe00c7d95a50 --- ip6_output() at ip6_output+0xb74/frame 0xfffffe00c7d95a50 udp6_send() at udp6_send+0x78e/frame 0xfffffe00c7d95c10 sosend_dgram() at sosend_dgram+0x357/frame 0xfffffe00c7d95c70 sousrsend() at sousrsend+0x5f/frame 0xfffffe00c7d95cd0 kern_sendit() at kern_sendit+0x132/frame 0xfffffe00c7d95d60 sendit() at sendit+0xb7/frame 0xfffffe00c7d95db0 sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00c7d95e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c7d95f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c7d95f30 --- syscall (133, FreeBSD ELF64, sendto), rip = 0x823f95f2a, rsp = 0x8202cea88, rbp = 0x8202cead0 ---
Updated by Kristof Provost over 1 year ago
I've not yet been able to reproduce this, but it looks like the issue in comment 9 and 10 is that we're trying to send IPv6 traffic on an interface where it's (now) disabled.
In 9 we may be dereferencing ND_IFINFO(ifp), which is done through ifp->if_afdata[AF_INET6], which may be NULL on interfaces without IPv6.
In 10 it appears to be hitting in6_ifstat_inc(), which dereferences ifp->if_afdata[AF_INET6], which may be NULL on interfaces without IPv6.
Note that is a different bug from the original report (where simultaneous IPv6 configuration changes interfered with each other).
Updated by Rob A over 1 year ago
Two more backtraces, should they offer any more insight:
db:1:pfs> bt
Tracing pid 3281 tid 100913 td 0xfffffe00cfe3e3a0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cfdc4800
vpanic() at vpanic+0x183/frame 0xfffffe00cfdc4850
panic() at panic+0x43/frame 0xfffffe00cfdc48b0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cfdc4910
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cfdc4970
calltrap() at calltrap+0x8/frame 0xfffffe00cfdc4970
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cfdc4a40, rbp = 0xfffffe00cfdc4a70 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cfdc4a70
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cfdc4c60
tcp_output() at tcp_output+0x14/frame 0xfffffe00cfdc4c80
tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cfdc4d10
soconnectat() at soconnectat+0x9e/frame 0xfffffe00cfdc4d60
kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cfdc4dc0
sys_connect() at sys_connect+0x75/frame 0xfffffe00cfdc4e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cfdc4f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cfdc4f30
--- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdfbfbc98, rbp = 0x7fffdfbfbcd0 ---
db:1:pfs>
db:1:pfs> bt
Tracing pid 2 tid 100041 td 0xfffffe0085264560
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850ad910
vpanic() at vpanic+0x183/frame 0xfffffe00850ad960
panic() at panic+0x43/frame 0xfffffe00850ad9c0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe00850ada20
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ada80
calltrap() at calltrap+0x8/frame 0xfffffe00850ada80
--- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00850adb50, rbp = 0xfffffe00850adb80 ---
in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850adb80
tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00850add70
tcp_timer_rexmt() at tcp_timer_rexmt+0x514/frame 0xfffffe00850addd0
tcp_timer_enter() at tcp_timer_enter+0x102/frame 0xfffffe00850ade10
softclock_call_cc() at softclock_call_cc+0x13c/frame 0xfffffe00850adec0
softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850adef0
fork_exit() at fork_exit+0x7d/frame 0xfffffe00850adf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850adf30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:1:pfs>
I have complete logs for 4 events, if needed.
☕️
Updated by Mateusz Guzik over 1 year ago
As Kristof said this is a different bug in ipv6 handling.
As such please open a new redmine with the new traces (+ assign to me) and close this one.
Updated by Steve Wheeler over 1 year ago
- Status changed from Incomplete to Resolved
Split to: https://redmine.pfsense.org/issues/14431