Bug #13938
closedKernel panic accessing the GUI over IPsec in certain environments when using nginx ``sendfile`` with unmapped mbufs
100%
Description
Under certain conditions which have not yet been identified, it is possible to encounter a kernel kernel panic on FreeBSD main/14.0-CURRENT builds (e.g. Plus 23.01) when attempting to access the GUI over an IPsec tunnel. Thus far we have only received a small number of reports (2) and we have not been able to reproduce the panic in lab conditions.
A community member tracked it down to the use of sendfile
in nginx when used in combination with unmapped mbufs (kern.ipc.mb_use_ext_pgs=1
) both of which are enabled by default.
Users encountering this crash can take either one of two actions:
1. Disable unmapped mbufs by adding a tunable to set kern.ipc.mb_use_ext_pgs=0
OR
2. Disable sendfile
in nginx as described in https://forum.netgate.com/post/1084590
Full details on the forum thread, including backtraces and textdump archives:
https://forum.netgate.com/topic/176974/web-gui-crashes-after-upgrade-from-22-05-to-23-01
Updated by Steve Wheeler over 1 year ago
To make searching easier the backtrace this generates is:
Tracing pid 3765 tid 100406 td 0xfffffe00c65a4900 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c3d6f320 vpanic() at vpanic+0x182/frame 0xfffffe00c3d6f370 panic() at panic+0x43/frame 0xfffffe00c3d6f3d0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c3d6f430 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c3d6f490 calltrap() at calltrap+0x8/frame 0xfffffe00c3d6f490 --- trap 0xc, rip = 0xffffffff813187ba, rsp = 0xfffffe00c3d6f560, rbp = 0xfffffe00c3d6f560 --- memcpy_erms() at memcpy_erms+0x10a/frame 0xfffffe00c3d6f560 m_unshare() at m_unshare+0x3de/frame 0xfffffe00c3d6f5e0 esp_output() at esp_output+0x186/frame 0xfffffe00c3d6f6d0 ipsec4_perform_request() at ipsec4_perform_request+0x1d2/frame 0xfffffe00c3d6f760 ipsec4_common_output() at ipsec4_common_output+0xa2/frame 0xfffffe00c3d6f7a0 ip_output() at ip_output+0x99d/frame 0xfffffe00c3d6f8a0 tcp_default_output() at tcp_default_output+0x1d2b/frame 0xfffffe00c3d6fa70 tcp_usr_ready() at tcp_usr_ready+0x1a1/frame 0xfffffe00c3d6fad0 sendfile_iodone() at sendfile_iodone+0x11c/frame 0xfffffe00c3d6fb10 vn_sendfile() at vn_sendfile+0x1663/frame 0xfffffe00c3d6fd70 sys_sendfile() at sys_sendfile+0xf7/frame 0xfffffe00c3d6fe00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00c3d6ff30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c3d6ff30 --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x8254b84ba, rsp = 0x8209bed68, rbp = 0x8209bf640 ---
Updated by Steve Wheeler over 1 year ago
The workarounds used here also seem to apply at least partially to connections over OpenVPN tunnels.
See: https://forum.netgate.com/post/1090118
In that case there is no kernel panic but it seems to crash nginx requiring a reboot.
Updated by Danilo Zrenjanin over 1 year ago
I can confirm that applying the patch from the forum fixed the issues with connections over IPsec.
https://forum.netgate.com/topic/176974/web-gui-crashes-after-upgrade-from-22-05-to-23-01/62
Updated by Christian McDonald over 1 year ago
- Status changed from New to Feedback
- % Done changed from 0 to 100
Applied in changeset 37c29e4de148a14480c01c8fa179e9b630bb0fb4.
Updated by Christian McDonald over 1 year ago
- Assignee changed from Mateusz Guzik to Christian McDonald
We will now disable sendfile mode. Sendfile has little to no benefit for us on pfSense.
This feature of nginx has been problematic upstream for a while, with it being broken and fixed several times.
Sendfile is really only useful when serving static files from UFS filesystem.
Updated by Jim Pingle over 1 year ago
- Status changed from Feedback to Resolved
sendfile
is off
in all nginx
configurations now, for the GUI and Captive Portal.
Updated by Mateusz Guzik over 1 year ago
Seeing as this is a bug in mbuf handling, I would argue the thing to do is to flip the unmapped buf support off -- there may be other programs out there using sendfile, no point of them crashing the system.
Updated by Christian McDonald over 1 year ago
That is a good point.
I've addressed this case too
https://gitlab.netgate.com/pfSense/pfSense/-/commit/3706158fe69c3de0c122f87d25215799a735f842
Updated by Christian McDonald over 1 year ago
- Status changed from Resolved to Feedback
Updated by Christian McDonald over 1 year ago
- Status changed from Feedback to Resolved
kern.ipc.mb_use_ext_pgs has been disabled for 2 weeks now.
Marking as resolved.