Regression #11839: Panic on 21.05/2.6.0 snapshots when memory usage is high - pfSense - pfSense bugtracker

Actions

Copy link

Regression #11839

closed

Panic on 21.05/2.6.0 snapshots when memory usage is high

Added by Jim Pingle about 4 years ago. Updated about 4 years ago.

Status:

Closed

Priority:

Very High

Assignee:

Kristof Provost

Category:

Operating System

Target version:

2.5.2

Start date:

04/22/2021

Due date:

% Done:

100%

Estimated time:

Plus Target Version:

21.05

Release Notes:

Force Exclusion

Affected Version:

2.6.0

Affected Architecture:

Description

On several systems (hardware and VMs) running Plus 21.05 and CE 2.6.0 snapshots I am seeing panics when the systems are experiencing high memory usage. Though memory usage alone is not always sufficient to induce a panic, the lower the memory on the system the easier it appears to be to trigger the condition.

The one system I can reproduce it on most reliably is easily triggered by an apparent bug in ospf6d which causes it to eat all available RAM after an interface event (See #11838 for details). On these systems all I need to do is save/apply on an assigned VTI interface taking part in OSPF6 or stop/start IPsec (not restart), and when IPsec reconnects it panics every time.

Another way to induce a panic in a system which is in a state where it's susceptible to panic is to run tail /dev/zero from an ssh or console shell prompt. That does not reliably induce a panic every time, however, even with multiple instances run in parallel. Thus I suspect there is some other compounding factor besides memory pressure which we haven't yet identified.

Textdumps from the most easily reproducible system are attached. The panic backtraces almost, but not entirely, happen in pf, but that may just happen to be what it was busy doing at the time.

Files

Download all files

textdump-7551-21.05-3.tar (73.5 KB) textdump-7551-21.05-3.tar		Jim Pingle, 04/22/2021 08:45 AM
textdump-7551-21.05-4.tar (111 KB) textdump-7551-21.05-4.tar		Jim Pingle, 04/22/2021 08:45 AM
textdump-7551-21.05-1.tar (73.5 KB) textdump-7551-21.05-1.tar		Jim Pingle, 04/22/2021 08:45 AM
textdump-7551-21.05-0.tar (90 KB) textdump-7551-21.05-0.tar		Jim Pingle, 04/22/2021 08:45 AM
textdump-7551-21.05-2.tar (95.5 KB) textdump-7551-21.05-2.tar		Jim Pingle, 04/22/2021 08:45 AM
textdump-7100-21.05-0.tar (138 KB) textdump-7100-21.05-0.tar		Jim Pingle, 04/22/2021 10:11 AM
textdump-ESX-2.6.0-1.tar (95 KB) textdump-ESX-2.6.0-1.tar		Jim Pingle, 04/27/2021 09:27 AM
textdump-ESX-2.6.0-0.tar (77.5 KB) textdump-ESX-2.6.0-0.tar		Jim Pingle, 04/27/2021 09:27 AM
textdump-ESX-2.6.0-2.tar (142 KB) textdump-ESX-2.6.0-2.tar		Jim Pingle, 05/07/2021 09:38 AM
textdump-KVM-2.6.0-3.tar (101 KB) textdump-KVM-2.6.0-3.tar		Jim Pingle, 05/07/2021 09:38 AM
config-pfSense.home.arpa-20210518194823.xml (21.1 KB) config-pfSense.home.arpa-20210518194823.xml		Jim Pingle, 05/18/2021 02:57 PM
textdump-KVM-21.05-4.tar (142 KB) textdump-KVM-21.05-4.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-21.05-3.tar (128 KB) textdump-KVM-21.05-3.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-21.05-1.tar (100 KB) textdump-KVM-21.05-1.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-21.05-0.tar (72 KB) textdump-KVM-21.05-0.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-21.05-2.tar (114 KB) textdump-KVM-21.05-2.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-2.6.0-8.tar (154 KB) textdump-KVM-2.6.0-8.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-ESX-2.6.0-3.tar (154 KB) textdump-ESX-2.6.0-3.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-2.6.0-6.tar (90.5 KB) textdump-KVM-2.6.0-6.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-KVM-2.6.0-7.tar (126 KB) textdump-KVM-2.6.0-7.tar		Jim Pingle, 05/19/2021 10:03 AM
textdump-APU-2.6.0-0.tar (84 KB) textdump-APU-2.6.0-0.tar		Jim Pingle, 05/19/2021 10:03 AM

Actions

Copy link

Updated by Jim Pingle about 4 years ago

Subject changed from Panic on 21.05/2.6.0 snapshots when VM memory usage is high to Panic on 21.05/2.6.0 snapshots when memory usage is high

Actions

Copy link

Updated by Jim Pingle about 4 years ago

File textdump-7100-21.05-0.tar textdump-7100-21.05-0.tar added

Attaching another crash with a potentially more interesting backtrace.

Actions

Copy link

Updated by Jim Pingle about 4 years ago

File textdump-ESX-2.6.0-1.tar textdump-ESX-2.6.0-1.tar added
File textdump-ESX-2.6.0-0.tar textdump-ESX-2.6.0-0.tar added

This continues to be simple to hit and quite annoying. Installs that worked fine for years all of a sudden can't run much beyond the base OS and remain stable.

Actions

Copy link

Updated by Jim Pingle about 4 years ago

File textdump-ESX-2.6.0-2.tar textdump-ESX-2.6.0-2.tar added
File textdump-KVM-2.6.0-3.tar textdump-KVM-2.6.0-3.tar added

A couple more. I have additional ones I haven't posted as well... Not sure how helpful they might be at this point since they all seem fairly random.

Actions

Copy link

Updated by Jim Pingle about 4 years ago

Target version changed from 21.05 to 2.6.0
Plus Target Version set to 21.05

Actions

Copy link

Updated by Jim Pingle about 4 years ago

File config-pfSense.home.arpa-20210518194823.xml config-pfSense.home.arpa-20210518194823.xml added

The attached configuration when loaded on a VM with 512MB of RAM can reproduce the panic reliably but with some variations in behavior. It leverages the OSPF6 bug to run the system out of RAM quickly. On some attempts ospf6d dies on its own (which is what should happen) but on other attempts it triggers a panic (no bueno).

Load the config on a fresh install and make sure FRR is installed and running (the config has it included). I would load the same config on a second unit as well so it will have at least one active OSPF6 neighbor. If you do that, make sure to adjust any system-specific parameters like the router ID in FRR OSPF6.

Once it's up and running:

Navigate to Interfaces > WAN, click save and then click apply changes
Wait about 20-30 seconds after applying.
If it doesn't panic, check Status > Services and see if ospf6d is running. If not, restart it, then try again.

In most of my trials it panics on the second attempt. Occasionally I have to restart ospf6d after applying and then test again, resulting in it taking 3-4 attempts at most.

The process used to create the config was:

* Create VM with 512MB RAM
* Install pfSense Plus 21.05 RC (latest snap) or CE 2.6.0
* pfSsh.php playback enableallowallwan
* Enable SSH
* Update to current build (if available)
* Interfaces > Assignments, GIF tab, create a new GIF on WAN, doesn't need to work, just exist (e.g. WAN, 198.51.100.101, 10.103.111.1, 10.103.111.2, 30), save
* Interfaces > Assignments, assign the GIF, should be OPT1
* Interfaces > OPT1, Enable, Save/Apply
* Install FRR
* Services > FRR > OSPF6, Interfaces tab. Add WAN interface w/Area 0.0.0.0, save
* Services > FRR > OSPF6, Interfaces tab. Add LAN, save
* Services > FRR > OSPF6, Interfaces tab. Add OPT1, save
* OSPF6 tab, enable, set router ID to something sane, set Area to 0.0.0.0, save
* FRR Global/Zebra tab, enable, set a master password (e.g. "abc123"), save

Actions

Copy link

Updated by Peter Grehan about 4 years ago

There are 3 signatures in the panics: I'd be interested in seeing more.

The KVM one is possibly fixed in FreeBSD-current (with 4174e45fb4320dc2), but it's more a symptom of low memory resulting in a rare allocation failure in pmap code.

2 of the ESX ones are the same: seems a race in VM code between 2 threads. The code path has been long removed in FreeBSD current so perhaps another side-effect of low-mem. The 7100 crash has the same signature

Thanks for the repro case: I'll give that a try.

Actions

Copy link

Updated by Jim Pingle about 4 years ago

File textdump-KVM-21.05-4.tar textdump-KVM-21.05-4.tar added
File textdump-KVM-21.05-3.tar textdump-KVM-21.05-3.tar added
File textdump-KVM-21.05-2.tar textdump-KVM-21.05-2.tar added
File textdump-KVM-21.05-1.tar textdump-KVM-21.05-1.tar added
File textdump-KVM-21.05-0.tar textdump-KVM-21.05-0.tar added
File textdump-KVM-2.6.0-8.tar textdump-KVM-2.6.0-8.tar added
File textdump-ESX-2.6.0-3.tar textdump-ESX-2.6.0-3.tar added
File textdump-KVM-2.6.0-7.tar textdump-KVM-2.6.0-7.tar added
File textdump-KVM-2.6.0-6.tar textdump-KVM-2.6.0-6.tar added
File textdump-APU-2.6.0-0.tar textdump-APU-2.6.0-0.tar added

Adding a few more I collected from a few misc installs during testing (some were deliberate crashes, others happened "naturally")

Actions

Copy link

Updated by Peter Grehan about 4 years ago

Thanks. The majority of these are associated with the pf counter_u64 issue (anything with pf in the traceback).

However, some others may not be: the pmap backtraces are possibly associated with the fix in FreeBSD (4174e45fb4320dc2), and the uma_reclaim() ones still unexplained.

Actions

Copy link

#10

Updated by Kristof Provost about 4 years ago

I believe these crashes all share the same root cause, which is that we (in certain places) mis-use the rule/state counters (we increment them directly rather than using the counter_u64 functions). Fixes have been pushed and are being tested.

Actions

Copy link

#11

Updated by Jim Pingle about 4 years ago

Status changed from New to Closed
Assignee set to Kristof Provost
% Done changed from 0 to 100

I've been aggressively attempting to crash the latest builds of 21.05 and 2.6.0 which include the fixes for this problem and thus far have had no success in triggering a panic. This is looking good to me. I could trigger it at-will a couple different ways before and now none of those methods lead to failures on any hardware or VM I try.

I'm willing to call this solved for the time being. If anything comes up I can reopen it.

Actions

Copy link

#12

Updated by Jim Pingle about 4 years ago

Release Notes changed from Default to Force Exclusion

Excluding from release notes since it was a problem introduced by changes after the last release.

Actions

Copy link

#13

Updated by Jim Pingle about 4 years ago

Target version changed from 2.6.0 to 2.5.2

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries