Bug #13210
closedPPPoE server panics with multiple client connections
0%
Description
When using the PPPoE server it's possible to trigger a kernel panic if enough clients attempt to connect. It appears to be a number of simultaneous connections that is required.
The logs/message buffer show numerous interface renaming events leading up to the panic:
<6>ng23: changing name to 'poes1-23' <6>ng24: changing name to 'poes1-24' <6>ng25: changing name to 'poes1-25' <6>ng26: changing name to 'poes1-26' <6>ng27: changing name to 'poes1-27' <6>ng28: changing name to 'poes1-28' <6>ng14: changing name to 'poes1-14' <6>ng7: changing name to 'poes1-7' <6>ng10: changing name to 'poes1-10' <6>ng11: changing name to 'poes1-11' <6>ng13: changing name to 'poes1-13' <6>ng4: changing name to 'poes1-4' <6>ng15: changing name to 'poes1-15' <6>ng18: changing name to 'poes1-18' <6>ng19: changing name to 'poes1-19' <6>ng20: changing name to 'poes1-20' Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 0c fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8109e252 stack pointer = 0x28:0xfffffe00253e0800 frame pointer = 0x28:0xfffffe00253e0840 pf_test6: kif == NULL, if_xname poes1-20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) trap number = 12 panic: page fault cpuid = 0 time = 1652916946 KDB: enter: panic
The backtrace produced is:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100022 td 0xfffff80005238740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe00253e04c0 vpanic() at vpanic+0x197/frame 0xfffffe00253e0510 panic() at panic+0x43/frame 0xfffffe00253e0570 trap_fatal() at trap_fatal+0x391/frame 0xfffffe00253e05d0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00253e0620 trap() at trap+0x286/frame 0xfffffe00253e0730 calltrap() at calltrap+0x8/frame 0xfffffe00253e0730 --- trap 0xc, rip = 0xffffffff8109e252, rsp = 0xfffffe00253e0800, rbp = 0xfffffe00253e0840 --- pfi_kkif_match() at pfi_kkif_match+0x62/frame 0xfffffe00253e0840 pf_match_translation() at pf_match_translation+0x120/frame 0xfffffe00253e08d0 pf_get_translation() at pf_get_translation+0xb8/frame 0xfffffe00253e0990 pf_test_rule() at pf_test_rule+0x27b/frame 0xfffffe00253e0e60 pf_test() at pf_test+0x15ce/frame 0xfffffe00253e10e0 pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00253e1100 pfil_run_hooks() at pfil_run_hooks+0x87/frame 0xfffffe00253e1190 ip_input() at ip_input+0x475/frame 0xfffffe00253e1240 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00253e1290 ng_iface_rcvdata() at ng_iface_rcvdata+0x131/frame 0xfffffe00253e12d0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00253e1370 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00253e13b0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00253e1450 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00253e1490 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00253e1530 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00253e1570 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00253e1610 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00253e1650 ng_pppoe_rcvdata_ether() at ng_pppoe_rcvdata_ether+0x193/frame 0xfffffe00253e16e0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00253e1780 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00253e17c0 ether_demux() at ether_demux+0x230/frame 0xfffffe00253e17f0 ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe00253e1850 netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00253e18a0 ether_input() at ether_input+0x89/frame 0xfffffe00253e1900 iflib_rxeof() at iflib_rxeof+0xad6/frame 0xfffffe00253e19e0 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00253e1a20 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe00253e1a80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe00253e1ab0 fork_exit() at fork_exit+0x7e/frame 0xfffffe00253e1af0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00253e1af0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Tested in 22.01
Updated by Mateusz Guzik over 2 years ago
Pushed fixes:
Author: Mateusz Guzik <mjg@netgate.com>
Date: Tue May 31 22:43:37 2022 +0000pf: fix a race against kif destruction in pf_test{,6}
ifp kif was dereferenced prior to taking the lock and
could have been nullified later.Upstream 6c92016aa685486f1445f632ac3f1af1385186af
Ticket 13210
Updated by Marcos M over 2 years ago
- Plus Target Version changed from 22.09 to 22.05
Updated by Marcos M over 2 years ago
- Status changed from Feedback to Resolved
Customer which was previously frequently hitting this issue reports it's been resolved after updating to the RC.
Updated by Jens Groh over 2 years ago
Sorry, wanted to add it here for documentation purpose but forgot to make it yesterday:
Original: Hallo Jens, Habe ein neues Update eingespielt. Es scheint alles zu laufen. Es gibt keine Auffälligkeiten mehr. Vor dem Update habe ich noch einige Emails von der PFsense bekommen bzgl. Packetloss. Diese sind mit dem Update auch weg. Kein Freeze oder Kernel Panic bislang. -- Translated: Hello Jens, updated to the latest release candidate. (to give more information about the update) Now everything seems to run, no noticable problems anymore. Before (aka on the early RC snapshot he was first on) I got a few emails of the box having slight packet loss at some times but they are gone with the new update (lateste RC snapshot), too. No freezes or kernel panics as of yet!
So the original problem with the PPPoE server seems fine now!
Thanks a lot to all involved for the speedy resolve!
Cheers
\jens