crash in l2tp retransmit
I'm using a Multilink L2TP WAN. After a fresh reinstall of 2.4.4 and completely new config (no import) it crashes regularly. The crashdump always has the same stacktrace showing a page fault while copying a packet for retransmitting after a timeout.
Fatal trap 12: page fault while in kernel mode ... current process = 13 (ng_queue1) ... Tracing pid 13 tid 100022 td 0xfffff80003551620 m_copypacket() at m_copypacket+0x16/frame 0xfffffe00797d3ab0 ng_l2tp_seq_rack_timeout() at ng_l2tp_seq_rack_timeout+0x164/frame 0xfffffe00797d3af0 ng_apply_item() at ng_apply_item+0x8f/frame 0xfffffe00797d3b70 ngthread() at ngthread+0x10a/frame 0xfffffe00797d3bb0 fork_exit() at fork_exit+0x83/frame 0xfffffe00797d3bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00797d3bf0
#2 Updated by Jim Pingle over 2 years ago
I saw a crash with a backtrace like that once on a test VM with an L2TP WAN but only one time, not repeatedly, so I chalked it up to a random power or other VM-specific event.
Approximately how often is "regularly"? Have you noticed any pattern to when it happens? Is it after a certain amount of traffic, certain amount of time, or randomly?
#6 Updated by Bianco Veigel over 2 years ago
- Time (20:00 & 03:10)
- Uptime (56h & 31h)
- Packout count (10/9M & 7/5M)
- Byte count (4.1/2.5GB & 3.1/1.1GB)
The byte/s and packet/s also have much higher peeks between the crashes.
#8 Updated by Bianco Veigel over 2 years ago
yes it's always the same (except the hex addresses)
Tracing pid 13 tid 100022 td 0xfffff800039b4000 m_copypacket() at m_copypacket+0x16/frame 0xfffffe01195f7ab0 ng_l2tp_seq_rack_timeout() at ng_l2tp_seq_rack_timeout+0x164/frame 0xfffffe01195f7af0 ng_apply_item() at ng_apply_item+0x8f/frame 0xfffffe01195f7b70 ngthread() at ngthread+0x10a/frame 0xfffffe01195f7bb0 fork_exit() at fork_exit+0x83/frame 0xfffffe01195f7bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01195f7bf0
#9 Updated by Bianco Veigel over 2 years ago
This seems to be an upstream bug in FreeBSD mpd5 - today I got the same crash on my L2TP Server (FreeBSD 11.2-RELEASE-p2 / mpd5 5.8):
Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1c fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80b79f06 stack pointer = 0x28:0xfffffe0094ab8a60 frame pointer = 0x28:0xfffffe0094ab8aa0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 619 (ng_queue2) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: #0 0xffffffff80b3d567 at kdb_backtrace+0x67 #1 0xffffffff80af6b07 at vpanic+0x177 #2 0xffffffff80af6983 at panic+0x43 #3 0xffffffff80f77fcf at trap_fatal+0x35f #4 0xffffffff80f78029 at trap_pfault+0x49 #5 0xffffffff80f777f7 at trap+0x2c7 #6 0xffffffff80f57dac at calltrap+0x8 #7 0xffffffff82835bf4 at ng_l2tp_seq_rack_timeout+0x164 #8 0xffffffff8282868d at ng_apply_item+0xcd #9 0xffffffff8282b084 at ngthread+0x1a4 #10 0xffffffff80aba083 at fork_exit+0x83 #11 0xffffffff80f58cce at fork_trampoline+0xe
#10 Updated by Jim Pingle almost 2 years ago
- Target version set to 2.5.0
- Affected Version changed from 2.4.4 to All
This is still happening on FreeBSD 12/pfSense 2.5.0. Same backtrace:
db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe007db41c80 curthread = 0xfffff80004166000: pid 13 tid 100023 "ng_queue1" curpcb = 0xfffffe000c798cc0 fpcurthread = none idlethread = 0xfffff8000411d580: tid 100004 "idle: cpu1" curpmap = 0xffffffff82c884c8 tssp = 0xffffffff82db3988 commontssp = 0xffffffff82db3988 rsp0 = 0xfffffe000c798cc0 gs32p = 0xffffffff82dba5c0 ldt = 0xffffffff82dba600 tss = 0xffffffff82dba5f0 curvnet = 0xfffff8000406d540 db:0:kdb.enter.default> bt Tracing pid 13 tid 100023 td 0xfffff80004166000 m_copypacket() at m_copypacket+0x16/frame 0xfffffe000c798aa0 ng_l2tp_seq_rack_timeout() at ng_l2tp_seq_rack_timeout+0x1aa/frame 0xfffffe000c798ae0 ng_apply_item() at ng_apply_item+0x8f/frame 0xfffffe000c798b60 ngthread() at ngthread+0x12a/frame 0xfffffe000c798bb0 fork_exit() at fork_exit+0x83/frame 0xfffffe000c798bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000c798bf0
I have a few saved crash reports if needed. Seems to happen once every couple weeks for me, so it's difficult to know if it's solved.
#11 Updated by Bianco Veigel over 1 year ago
I've opened a bug at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241133
#14 Updated by Bianco Veigel 8 months ago
As far as I can tell this has been accepted upstream (https://svnweb.freebsd.org/changeset/base/366167). Can someone point me to a howto or tutorial, which explains, what I need to do to apply this fix to pfsense? I've hit this bug at least twice in the last three days.
I'm currently running pfSense 2.4.5-RELEASE-p1 (amd64).
#19 Updated by Renato Botelho 8 months ago
Bianco Veigel wrote:
I've updated to 2.5.0.a.20200928.1250 and got the same crash as before. I've attached the crashdump.
Is there anything else, I can do?
I've added INVARIANTS to kernel in order to collect more useful data from crash. Please try it again on 20200929-1250 image
#26 Updated by Mark Johnston 8 months ago
Bianco Veigel wrote:
it crashed again with 2.5.0.a.20200930.0050
Thanks for your patience so far, it's very appreciated. I have another debugging patch that fixes a few other possible races and prints some extra info. I'll try to get it into a snapshot soon.