Bug #5428
closedFrequent IPv6 panic on 2.3 - May be log-related
0%
Description
Hit a panic on 2.3 twice now, ping me for the full trace.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ba8a8b stack pointer = 0x28:0xfffffe003f3d3f40 frame pointer = 0x28:0xfffffe003f3d3f50 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: igb0:que 0) [ thread pid 12 tid 100028 ]
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x502400 curthread = 0xfffff800034c9940: pid 12 "irq256: igb0:que 0" curpcb = 0xfffffe003f3d4cc0 fpcurthread = none idlethread = 0xfffff80003290000: tid 100003 "idle: cpu0" curpmap = 0xffffffff822b4928 tssp = 0xffffffff822cf790 commontssp = 0xffffffff822cf790 rsp0 = 0xfffffe003f3d4cc0 gs32p = 0xffffffff822d11e8 ldt = 0xffffffff822d1228 tss = 0xffffffff822d1218 db:0:kdb.enter.default> bt Tracing pid 12 tid 100028 td 0xfffff800034c9940 strlen() at strlen+0xb/frame 0xfffffe003f3d3f50 kvprintf() at kvprintf+0xf9c/frame 0xfffffe003f3d4060 _vprintf() at _vprintf+0x8d/frame 0xfffffe003f3d4140 log() at log+0x5c/frame 0xfffffe003f3d41a0 ip6_forward() at ip6_forward+0x119/frame 0xfffffe003f3d42f0 pf_refragment6() at pf_refragment6+0x16e/frame 0xfffffe003f3d43b0 pf_test6() at pf_test6+0x1448/frame 0xfffffe003f3d46e0 pf_check6_out() at pf_check6_out+0x1d/frame 0xfffffe003f3d4700 pfil_run_hooks() at pfil_run_hooks+0x8d/frame 0xfffffe003f3d4790 bridge_pfil() at bridge_pfil+0x25b/frame 0xfffffe003f3d4810 bridge_broadcast() at bridge_broadcast+0x22c/frame 0xfffffe003f3d4880 bridge_forward() at bridge_forward+0x245/frame 0xfffffe003f3d48e0 bridge_input() at bridge_input+0x2a0/frame 0xfffffe003f3d4950 ether_nh_input() at ether_nh_input+0x2a5/frame 0xfffffe003f3d49b0 netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe003f3d4a20 igb_rxeof() at igb_rxeof+0x63b/frame 0xfffffe003f3d4ad0 igb_msix_que() at igb_msix_que+0x16d/frame 0xfffffe003f3d4b20 intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe003f3d4b60 ithread_loop() at ithread_loop+0x96/frame 0xfffffe003f3d4bb0 fork_exit() at fork_exit+0x9a/frame 0xfffffe003f3d4bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003f3d4bf0 --- trap 0, rip = 0, rsp = 0xfffffe003f3d4cb0, rbp = 0 ---
I may have a means to reproduce it, but as it is my prod edge firewall I'm not too keen on testing that theory frequently.
Luiz said "I'll take a look, seems that something is wrong with log (a possible null string is passed to strlen)"
Updated by Jim Thompson almost 9 years ago
There are only two calls to log() in ip6_forward()
if (V_ip6_log_time + V_ip6_log_interval < time_uptime) { V_ip6_log_time = time_uptime; log(LOG_DEBUG, "cannot forward " "src %s, dst %s, nxt %d, rcvif %s, outif %s\n", ip6_sprintf(ip6bufs, &ip6->ip6_src), ip6_sprintf(ip6bufd, &ip6->ip6_dst), ip6->ip6_nxt, if_name(m->m_pkthdr.rcvif), if_name(rt->rt_ifp)); }
This occurs if a packet can't be delivered to its destination for the reason that the destination is beyond the scope of the source address
and
if (V_ip6_log_time + V_ip6_log_interval < time_uptime) { V_ip6_log_time = time_uptime; log(LOG_DEBUG, "cannot forward " "from %s to %s nxt %d received on %s\n", ip6_sprintf(ip6bufs, &ip6->ip6_src), ip6_sprintf(ip6bufd, &ip6->ip6_dst), ip6->ip6_nxt, if_name(m->m_pkthdr.rcvif)); }
This occurs if the the dest addr is broadcast, multicast or the source is "unspecified" (an inaddr ANY)
neither should cause a crash, but... are you sure you don't have something in pf that is rewritring to crap?
the presence of 'bridge' makes me itch.
Having a better stacktrace would be good.
and your config, or at least your pf.conf
Updated by Luiz Souza almost 9 years ago
I think the m->m_pkthdr.rcvif is undefined here. The interface name can't be NULL (because this is correctly handled in sys/kern/subr_prf.c), it is probably pointing to a random address:
801 case 's':
802 p = va_arg(ap, char *);
803 if (p == NULL)
804 p = "(null)";
805 if (!dot)
806 n = strlen (p);
807 else
808 for (n = 0; n < dwidth && p[n]; n++)
809 continue;
https://git.pfmechanics.com/luiz/freebsd-src/commit/85293302a79ecc6eeb30adeb236a8678f9287e16
Is my first try at this problem, but not sure (yet) if it is correct or complete. I'll arrange to JimP get a snapshot with this change.
Updated by Jim Pingle almost 9 years ago
Not sure why yet but it appears to be tied to Bonjour traffic on the local network. Disabling or enabling the Bonjour plugin in Pidgin crashes the firewall instantly for me and for Renato, though we have not as yet replicated that in a test structure. More info coming as we dig deeper.
Updated by Jim Pingle almost 9 years ago
I checked in a copy of a Packet capture containing traffic that crashes the firewall into the ESFprojects repo under redmine-5428/
Updated by Jim Pingle almost 9 years ago
A few more details, getting closer to the heart of it:
- The crash appears to require the presence of a bridge on the receiving interface (e.g. LAN and LAN2 bridged, packet comes in LAN)
- The WAN type does not matter (dhcp6, HE.net, etc)
- Replaying the capture file to a system configured with the above scenario will cause it to crash, so there is no need to fiddle with clients to replicate the problem (e.g. Testing from a linux client, run "sudo tcpreplay --intf1=eth0 ipv6-crash-redmine-#5428.pcapng")
- Judging by the contents of the packet capture, it's a fragmented v6 multicast packet causing it
- The traffic must be passed by IPv6 firewall rules (make sure the rule allows all from any/to any) -- if the traffic is blocked it does not crash.
Renato applied a patch from Luiz and tested it and received the following log message (and no crash):
Nov 18 13:02:05 pfgarga kernel: cannot forward from fe80::3e15:c2ff:fec8:1414 to ff02::fb nxt 44 received on (null)
Updated by Renato Botelho almost 9 years ago
- Status changed from New to Feedback
It should be fixed on next snapshots after https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4
Updated by Jim Pingle almost 9 years ago
- Status changed from Feedback to Resolved
This is working OK now, I updated my edge firewall that originally showed the problem and replayed the traffic against it, no crash. System log showed the expected log message with the interface filled in. Looks like we can close it.