Bug #8499
closedIPv6 fragment logging causes panic in some circumstances
100%
Description
From customer ticket #4934.
The system crashes repeatedly with near identical back traces:
db:0:kdb.enter.default> bt Tracing pid 12 tid 100074 td 0xfffff800036f55c0 strlen() at strlen+0x1f/frame 0xfffffe0059be7c90 kvprintf() at kvprintf+0x9dc/frame 0xfffffe0059be7d90 vlog() at vlog+0x9b/frame 0xfffffe0059be7e70 log() at log+0x3f/frame 0xfffffe0059be7ed0 ip6_forward() at ip6_forward+0xc3/frame 0xfffffe0059be8020 pf_refragment6() at pf_refragment6+0x17a/frame 0xfffffe0059be80e0 pf_test6() at pf_test6+0x14ba/frame 0xfffffe0059be8360 pf_test6() at pf_test6+0x1edb/frame 0xfffffe0059be85e0 pf_check6_in() at pf_check6_in+0x30/frame 0xfffffe0059be8600 pfil_run_hooks() at pfil_run_hooks+0x83/frame 0xfffffe0059be8690 ip6_input() at ip6_input+0xc75/frame 0xfffffe0059be8770 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe0059be87c0 ether_demux() at ether_demux+0x16d/frame 0xfffffe0059be87f0 ether_nh_input() at ether_nh_input+0x337/frame 0xfffffe0059be8850 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe0059be88a0 ether_input() at ether_input+0x26/frame 0xfffffe0059be88c0 vlan_input() at vlan_input+0x1f0/frame 0xfffffe0059be8940 ether_demux() at ether_demux+0x156/frame 0xfffffe0059be8970 ether_nh_input() at ether_nh_input+0x337/frame 0xfffffe0059be89d0 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe0059be8a20 ether_input() at ether_input+0x26/frame 0xfffffe0059be8a40 igb_rxeof() at igb_rxeof+0x6ac/frame 0xfffffe0059be8ad0 igb_msix_que() at igb_msix_que+0x114/frame 0xfffffe0059be8b20 intr_event_execute_handlers() at intr_event_execute_handlers+0xec/frame 0xfffffe0059be8b60 ithread_loop() at ithread_loop+0xd6/frame 0xfffffe0059be8bb0 fork_exit() at fork_exit+0x85/frame 0xfffffe0059be8bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0059be8bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> ps
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x60 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ccc93f stack pointer = 0x28:0xfffffe0059be7c90 frame pointer = 0x28:0xfffffe0059be7c90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: igb0:que 0)
This looks very similar to https://redmine.pfsense.org/issues/5428 except here no bridges are configured.
A PPPoE interface is configured with IPv6:
<opt10> <descr><![CDATA[xxxxx]]></descr> <if>pppoe0</if> <spoofmac></spoofmac> <enable></enable> <ipaddr>pppoe</ipaddr> <ipaddrv6>dhcp6</ipaddrv6> <dhcp6-duid></dhcp6-duid> <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len> <dhcp6usev4iface></dhcp6usev4iface> <dhcp6cvpt>bk</dhcp6cvpt> <adv_dhcp6_prefix_selected_interface>wan</adv_dhcp6_prefix_selected_interface> </opt10>
The v6 is currently not functional on that interface.
That is running on igb2 though.
The crash reports have all been on igb0 which here is configured with a number of VLANs some that have IPv6 configured on them.
Also see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220611#c4
Updated by Luiz Souza over 6 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
I committed our old fix for now, once the kp@ fix on the PR is tested I'll apply his fix.
Please check with the next snapshot.
Updated by Steve Wheeler about 6 years ago
I've never been able to replicate that locally. It's going to be very difficult to test.
Updated by Constantine Kormashev about 6 years ago
Looks like this is PPPoE related issue. I do not see problem with fragmented IPv6 and logging on Ethernet IPv6 forwarding.
But we need more tests with PPPoE, because the issue is hard to reproduce
Updated by Jim Pingle about 6 years ago
- Target version changed from 2.4.4 to 2.4.4-GS
Still waiting on someone that can reproduce it to confirm if it still happens. May be fixed, but we won't know for certain until we get confirmation from a user that hit the original bug.
Updated by Anonymous about 6 years ago
- Target version changed from 2.4.4-GS to 2.4.4-p1
Updated by Anonymous about 6 years ago
- Target version changed from 2.4.4-p1 to 2.4.4-GS
Updated by Luiz Souza about 6 years ago
- Target version changed from 2.4.4-GS to 2.4.4-p1
Updated by Renato Botelho almost 6 years ago
- Status changed from Feedback to Resolved
It should be resolved now but it's hard to reproduce. We can revisit if bug show up again