Bug #15752: Montly kernel panic - pfSense - pfSense bugtracker

Actions

Copy link

Bug #15752

closed

Montly kernel panic

Added by Sebastian Wagner 8 months ago. Updated 7 months ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Reid Linnemann

Category:

Operating System

Target version:

Start date:

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Default

Affected Version:

Affected Architecture:

7100

Description

In a regular interval, every month, we experience a kernel panic. As the appliance is connected via a USB console cable we are luckily able to resolve it remotely.

The console shows this repeatedly, almost flooding the screen.

Tracing command kernel pid 0 tid 309435 td 0xfffff8006e140740
sched_switch() at sched_switch+0x88a/frame 0xfffffe00bc10ee20
mi_switch() at mi_switch+0xba/frame 0xfffffe00bc10ee40
_sleep() at _sleep+0x1be/frame 0xfffffe00bc10eec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame 0xfffffe00bc10eef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00bc10ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00bc10ef30
--- trap 0x6e6177, rip = 0x6766637063, rsp = 0x7, rbp = 0x1600000001 ---

Sending Ctrl+C stops that and triggers a reboot. Luckily we found that workaround :)

Netgate 7100
24.03-RELEASE (amd64)

Attached are the output recovered from the console with screen and the info and textdump files offered by the webinterface. Let us know if any other logs are relevant before they are rotated.

Files

Download all files

info.0 (547 Bytes) info.0		Sebastian Wagner, 09/29/2024 10:24 AM
2024-09-29-screen.txt (28.7 KB) 2024-09-29-screen.txt	output from console	Sebastian Wagner, 09/29/2024 10:24 AM
textdump.tar.0 (546 KB) textdump.tar.0		Sebastian Wagner, 09/29/2024 10:24 AM

Related issues

Actions

Copy link

Updated by Kris Phillips 8 months ago

Status changed from New to Incomplete

Have you tested the RAM on your appliance to verify this isn't a memory issue? Page faults are typically an issue with RAM and if it's happening frequently enough, it could be intermittently failing hardware.

Actions

Copy link

Updated by Sebastian Wagner 8 months ago

Thank you for the response. There doesn't seem to be a memtest included, so the best option is to use the bootable media with USB from https://memtest.org/, I guess?

Actions

Copy link

Updated by Jordan G 7 months ago

Sebastian Wagner wrote in #note-2:

Thank you for the response. There doesn't seem to be a memtest included, so the best option is to use the bootable media with USB from https://memtest.org/, I guess?

yes that would work or whatever flavor bootable distribution that contains diagnostic memory testing

Actions

Copy link

Updated by Sebastian Wagner 7 months ago

We were able to perform a first test now:

      Memtest86+ v7.00      | Intel(R) Atom(TM) CPU C3558 @ 2.20GHz
CLK/Temp: 2200MHz   58/62*C | Pass 30% ############
L1 Cache:   24KB  41.6 GB/s | Test 49% ###################
L2 Cache:    2MB  23.2 GB/s | Test #6  [Moving inversions, 64 bit pattern]
L3 Cache:   N/A             | Testing: 4GB - 5GB [1GB of 7.99GB]
Memory  : 7.99GB  4.28 GB/s | Pattern: 0xefffffffffffffff
--------------------------------------------------------------------------------
CPU: 4 Cores 4 Threads    SMP: 4T (PAR)   | Time:  0:43:20  Status: Pass     /
RAM: 1200MHz (DDR4-2400) CAS 17-17-17-39  | Pass:  1        Errors: 0
--------------------------------------------------------------------------------

That showed no errors in one pass.

Meanwhile, the error keeps happening, with varying frequency. The shortest gap was 8 days now.

Actions

Copy link

Updated by Reid Linnemann 7 months ago

Status changed from Incomplete to Duplicate
Assignee set to Reid Linnemann
Parent task set to #15684

This is a known issue in both CE and 24.03, I've reclassified this as a duplicate and linked the parent task. The parent issue is https://redmine.pfsense.org/issues/15684. You can work around this by disabling selective ack in the system tunables:

net.inet.tcp.sack.enable=0

This will also be fixed in 24.11 which is in BETA at this time.

Actions

Copy link