Project

General

Profile

Actions

Bug #10254

closed

pf error "too many elements" when attempting to load large tables

Added by Jim Pingle almost 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Operating System
Target version:
Start date:
02/11/2020
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.4.5
Affected Architecture:
All

Description

On at least pfSense-base-2.4.5.r.20200210.0912 and later, pf fails to load large tables no matter what the limits are in pf:

: pfctl -f /tmp/rules.debug
/tmp/rules.debug:23: cannot define table bogonsv6: too many elements.
Consider increasing net.pf.request_maxcount.
pfctl: Syntax error in config file: pf rules not loaded

However, that OID is not present on 2.4.5:

: sysctl net.pf
net.pf.source_nodes_hashsize: 8192
net.pf.states_hashsize: 32768
: sysctl -a | grep request_maxcount
0
:

There is plenty of room in the table hard limit:

: wc -l /etc/bogonsv6 
  108611 /etc/bogonsv6
: pfctl -sm | grep table
table-entries hard limit  2000000

Similar to #9356 on 2.5.0, but in that case we set a higher default for that OID. That does not appear to be possible on 2.4.5.

Tried on amd64 and SG-3100, same result on both.

Actions #1

Updated by Jim Pingle almost 5 years ago

  • Target version set to 2.4.5
Actions #2

Updated by Jim Pingle almost 5 years ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by Jim Pingle almost 5 years ago

The easiest way to reproduce the problem is to enable blocking of Bogons on any interface with IPv6 configured.

Actions #4

Updated by Jim Pingle almost 5 years ago

Looking in the FreeBSD source, it appears that the code which produces the error (r343520) is present on the branch used for 2.4.5 and in FreeBSD stable/11 but the code which handles the sysctl OID and backend is not ( https://reviews.freebsd.org/D15018, 205176451d5ad5f9fc9540f650e9d7efd1f728f5, rS332404 ).

It would probably be safer to revert the code producing the error than to pull in the larger change. Without that, the error appears to be cosmetic, and if we pull in the other change then we also have to worry about resolving #9356 for 2.4.5.

Actions #5

Updated by Jim Pingle almost 5 years ago

Current snapshots have that change reverted but are still not behaving properly. Even though there appears to be sufficient room in the table space, pf is yielding a memory allocation error:

: wc -l /etc/bogonsv6 
  108654 /etc/bogonsv6
: pfctl -sm
states        hard limit   202000
src-nodes     hard limit   202000
frags         hard limit     5000
table-entries hard limit   400000
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: Cannot allocate memory
pfctl: Syntax error in config file: pf rules not loaded

Similar behavior on amd64 and ARM, but on amd64 it prints an error once and then works the next time, while ARM never works. Rebooting amd64 in this state yields one instance of this allocation error recorded but the table is loaded after boot. Rebooting ARM in this state yields two instances of this error at boot but the ruleset still fails to reload even manually.

Similar configurations work on 2.5.0 with both amd64 and ARM. Tables are loaded, no errors.

Actions #6

Updated by Jim Pingle almost 5 years ago

Looks to be failing around 65k, which was the default limit on net.pf.request_maxcount

: pfctl -T flush -t bogonsv6
: head -n 65535 /etc/bogonsv6-stock > /etc/bogonsv6
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: Cannot allocate memory
pfctl: Syntax error in config file: pf rules not loaded
: pfctl -T flush -t bogonsv6
: head -n 65534 /etc/bogonsv6-stock > /etc/bogonsv6
: pfctl -f /tmp/rules.debug
: pfctl -T show -t bogonsv6 | wc -l
   65533
Actions #7

Updated by Jim Pingle almost 5 years ago

https://github.com/pfsense/FreeBSD-src/commit/8f7d14d3049de4fb6f82c7e97153c4372674a1e7 might need to be reverted, or we should just sync up with what 12.x has for net.pf.request_maxcount which is probably safer at this point.

Actions #8

Updated by Jim Pingle almost 5 years ago

Current snapshots have the code which allows us to set the request limit via net.pf.request_maxcount. However, it isn't being set until late in the upgrade process so the first full post-upgrade boot doesn't have a high enough value to allow bogonsv6 to load without errors.

amd64 first post-upgrade boot:

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="2000000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 500000

After reboot:

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="2000000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 2000000

SG-3100 first post-upgrade boot (loading bogonsv6 failed):

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 65535

SG-3100 after one more reboot (loading bogonsv6 worked):

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 400000

Looks like we might need to copy or move the code which sets that value to a place that runs earlier, like when the kernel itself gets upgraded or just after the upgrade starts before it reboots the first time.

Actions #9

Updated by Renato Botelho almost 5 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100
Actions #10

Updated by Jim Pingle almost 5 years ago

  • Status changed from Feedback to In Progress
  • Assignee set to Renato Botelho

Something is still not quite right with this value post-upgrade. The first boot after any firmware upgrade (like one snapshot to the next) fails to use the correct value. Later reboots are fine.

: grep request_max /boot/loader.conf 
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 65535
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: too many elements.
Consider increasing net.pf.request_maxcount.
pfctl: Syntax error in config file: pf rules not loaded

That was set in loader.conf before the upgrade, so somehow it is either being ignored or cleared/reset during the upgrade.

This is on ARM (SG-3100) but I see a similar issue on amd64 as well.

Actions #11

Updated by Renato Botelho over 4 years ago

  • Status changed from In Progress to Feedback

pfSense-upgrade 0.74 (on 2.5.0 and 2.4.5) and 0.63 on 2.4.4 will fix it

Actions #12

Updated by Jim Pingle over 4 years ago

  • Status changed from Feedback to In Progress

There is still a problem here we're investigating

Actions #13

Updated by Renato Botelho over 4 years ago

  • Status changed from In Progress to Feedback

- pfSense-upgrade was copying loader.conf to a tmp file before upgrade kernel/rc and copying it back to place after that due to a bug that happened in the past where kernel package was installing a static version of loader.conf
- Reverted that and even after that we noted pieces missing from loader.conf during the upgrade
- Noted SG-3100 kernel package still contains a static version of loader.conf. It means we need the pfSense-upgrade hack back, so I revert the reverted commit and added it back
- Removed loader.conf from non-amd64 archs kernel packages
- Reworked pfSense-upgrade to update rc package before backup loader.conf

We are going to make more tests when new snapshots are available. pfSense-upgrade 0.76 must be used

Actions #14

Updated by Jim Pingle over 4 years ago

Systems where this problem was due to loader.conf issues appear to be OK on current snapshots. I've upgraded a system which saw the problem on every upgrade in the past and it is OK now.

There is another situation which appears to be similar but isn't the same issue. That has been moved to #10310

Actions #15

Updated by Jim Pingle over 4 years ago

  • Status changed from Feedback to Resolved
Actions #16

Updated by Dmitry Fill about 4 years ago

Just upgraded today to 2.5.0.a.20200901.2100, hitting exact same issue. Seems like regression.

Every reboot have to run:

sysctl -w net.pf.request_maxcount=262144
net.pf.request_maxcount: 65535 -> 262144

even /boot/loader.conf has value from UI

cat /boot/loader.conf
kern.cam.boot_delay=10000
kern.ipc.nmbclusters="1000000" 
kern.ipc.nmbjumbop="524288" 
kern.ipc.nmbjumbo9="524288" 
autoboot_delay="3" 
net.pf.request_maxcount="500000" 
hw.hn.vf_transparent="0" 
hw.hn.use_if_start="1" 
Actions #17

Updated by Jim Pingle about 4 years ago

This issue is quite old and resolved in a previous version. I created a new issue for the regression after confirming it: https://redmine.pfsense.org/issues/10861

Actions

Also available in: Atom PDF