Project

General

Profile

Bug #10254

pf error "too many elements" when attempting to load large tables

Added by Jim Pingle 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Operating System
Target version:
Start date:
02/11/2020
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.4.5
Affected Architecture:
All

Description

On at least pfSense-base-2.4.5.r.20200210.0912 and later, pf fails to load large tables no matter what the limits are in pf:

: pfctl -f /tmp/rules.debug
/tmp/rules.debug:23: cannot define table bogonsv6: too many elements.
Consider increasing net.pf.request_maxcount.
pfctl: Syntax error in config file: pf rules not loaded

However, that OID is not present on 2.4.5:

: sysctl net.pf
net.pf.source_nodes_hashsize: 8192
net.pf.states_hashsize: 32768
: sysctl -a | grep request_maxcount
0
:

There is plenty of room in the table hard limit:

: wc -l /etc/bogonsv6 
  108611 /etc/bogonsv6
: pfctl -sm | grep table
table-entries hard limit  2000000

Similar to #9356 on 2.5.0, but in that case we set a higher default for that OID. That does not appear to be possible on 2.4.5.

Tried on amd64 and SG-3100, same result on both.

Associated revisions

Revision da569f45 (diff)
Added by Renato Botelho 5 months ago

Ticket #10254: Set net.pf.request_maxcount on upgrade

Add pre-install script to pfSense-rc to set default value to
net.pf.request_maxcount before reboot

Revision 9bdf3477 (diff)
Added by Renato Botelho 5 months ago

Ticket #10254: Set net.pf.request_maxcount on upgrade

Add pre-install script to pfSense-rc to set default value to
net.pf.request_maxcount before reboot

Revision 3b6ad495 (diff)
Added by Renato Botelho 5 months ago

Fix #10254: Default value is minimumtableentries_bogonsv6 from globals.inc

Revision ce164bb8 (diff)
Added by Renato Botelho 5 months ago

Fix #10254: Default value is minimumtableentries_bogonsv6 from globals.inc

History

#1 Updated by Jim Pingle 5 months ago

  • Target version set to 2.4.5

#2 Updated by Jim Pingle 5 months ago

  • Priority changed from Normal to Urgent

#3 Updated by Jim Pingle 5 months ago

The easiest way to reproduce the problem is to enable blocking of Bogons on any interface with IPv6 configured.

#4 Updated by Jim Pingle 5 months ago

Looking in the FreeBSD source, it appears that the code which produces the error (r343520) is present on the branch used for 2.4.5 and in FreeBSD stable/11 but the code which handles the sysctl OID and backend is not ( https://reviews.freebsd.org/D15018, 205176451d5ad5f9fc9540f650e9d7efd1f728f5, rS332404 ).

It would probably be safer to revert the code producing the error than to pull in the larger change. Without that, the error appears to be cosmetic, and if we pull in the other change then we also have to worry about resolving #9356 for 2.4.5.

#5 Updated by Jim Pingle 5 months ago

Current snapshots have that change reverted but are still not behaving properly. Even though there appears to be sufficient room in the table space, pf is yielding a memory allocation error:

: wc -l /etc/bogonsv6 
  108654 /etc/bogonsv6
: pfctl -sm
states        hard limit   202000
src-nodes     hard limit   202000
frags         hard limit     5000
table-entries hard limit   400000
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: Cannot allocate memory
pfctl: Syntax error in config file: pf rules not loaded

Similar behavior on amd64 and ARM, but on amd64 it prints an error once and then works the next time, while ARM never works. Rebooting amd64 in this state yields one instance of this allocation error recorded but the table is loaded after boot. Rebooting ARM in this state yields two instances of this error at boot but the ruleset still fails to reload even manually.

Similar configurations work on 2.5.0 with both amd64 and ARM. Tables are loaded, no errors.

#6 Updated by Jim Pingle 5 months ago

Looks to be failing around 65k, which was the default limit on net.pf.request_maxcount

: pfctl -T flush -t bogonsv6
: head -n 65535 /etc/bogonsv6-stock > /etc/bogonsv6
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: Cannot allocate memory
pfctl: Syntax error in config file: pf rules not loaded
: pfctl -T flush -t bogonsv6
: head -n 65534 /etc/bogonsv6-stock > /etc/bogonsv6
: pfctl -f /tmp/rules.debug
: pfctl -T show -t bogonsv6 | wc -l
   65533

#7 Updated by Jim Pingle 5 months ago

https://github.com/pfsense/FreeBSD-src/commit/8f7d14d3049de4fb6f82c7e97153c4372674a1e7 might need to be reverted, or we should just sync up with what 12.x has for net.pf.request_maxcount which is probably safer at this point.

#8 Updated by Jim Pingle 5 months ago

Current snapshots have the code which allows us to set the request limit via net.pf.request_maxcount. However, it isn't being set until late in the upgrade process so the first full post-upgrade boot doesn't have a high enough value to allow bogonsv6 to load without errors.

amd64 first post-upgrade boot:

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="2000000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 500000

After reboot:

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="2000000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 2000000

SG-3100 first post-upgrade boot (loading bogonsv6 failed):

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 65535

SG-3100 after one more reboot (loading bogonsv6 worked):

: grep net.pf.request_maxcount /boot/loader.conf
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 400000

Looks like we might need to copy or move the code which sets that value to a place that runs earlier, like when the kernel itself gets upgraded or just after the upgrade starts before it reboots the first time.

#9 Updated by Renato Botelho 5 months ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

#10 Updated by Jim Pingle 5 months ago

  • Status changed from Feedback to In Progress
  • Assignee set to Renato Botelho

Something is still not quite right with this value post-upgrade. The first boot after any firmware upgrade (like one snapshot to the next) fails to use the correct value. Later reboots are fine.

: grep request_max /boot/loader.conf 
net.pf.request_maxcount="400000" 
: sysctl net.pf.request_maxcount
net.pf.request_maxcount: 65535
: pfctl -f /tmp/rules.debug
/tmp/rules.debug:20: cannot define table bogonsv6: too many elements.
Consider increasing net.pf.request_maxcount.
pfctl: Syntax error in config file: pf rules not loaded

That was set in loader.conf before the upgrade, so somehow it is either being ignored or cleared/reset during the upgrade.

This is on ARM (SG-3100) but I see a similar issue on amd64 as well.

#11 Updated by Renato Botelho 5 months ago

  • Status changed from In Progress to Feedback

pfSense-upgrade 0.74 (on 2.5.0 and 2.4.5) and 0.63 on 2.4.4 will fix it

#12 Updated by Jim Pingle 5 months ago

  • Status changed from Feedback to In Progress

There is still a problem here we're investigating

#13 Updated by Renato Botelho 5 months ago

  • Status changed from In Progress to Feedback

- pfSense-upgrade was copying loader.conf to a tmp file before upgrade kernel/rc and copying it back to place after that due to a bug that happened in the past where kernel package was installing a static version of loader.conf
- Reverted that and even after that we noted pieces missing from loader.conf during the upgrade
- Noted SG-3100 kernel package still contains a static version of loader.conf. It means we need the pfSense-upgrade hack back, so I revert the reverted commit and added it back
- Removed loader.conf from non-amd64 archs kernel packages
- Reworked pfSense-upgrade to update rc package before backup loader.conf

We are going to make more tests when new snapshots are available. pfSense-upgrade 0.76 must be used

#14 Updated by Jim Pingle 4 months ago

Systems where this problem was due to loader.conf issues appear to be OK on current snapshots. I've upgraded a system which saw the problem on every upgrade in the past and it is OK now.

There is another situation which appears to be similar but isn't the same issue. That has been moved to #10310

#15 Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF