Bug #14693
openFilter reload with NAT reflection rules is extremely slow
0%
Description
We're running a PFSense cluster which contains the following amount of rules:
- 60x Outbound NAT rule
- 120x NAT rule (port forward)
- 80x 1:1 NAT rule
- 850x Firewall rule
When reloading the filter (or applying changes to rules / NAT) the full reload will take 10 minutes to finish!
When i check the logs on the "Filter Reload" page the "NAT Reflection" rules are taking 5 seconds each! The Firewall rules are quite fast, they are loaded all together in about 5 seconds!
Initializing Creating aliases Creating gateway group item... Generating Limiter rules Generating NAT rules Creating 1:1 rules... Creating reflection NAT rule for VIP for xxxxxx... -> takes about 5 seconds Creating reflection NAT rule for VIP for xxxxxx... -> takes about 5 seconds Creating reflection NAT rule for VIP for xxxxxx... -> takes about 5 seconds Creating reflection NAT rule for VIP for xxxxxx... -> takes about 5 seconds Creating reflection NAT rule for VIP for xxxxxx... -> takes about 5 seconds (and so on, for all reflection rules)
So the problem (delay) seems only to be present at the "NAT reflection" rules.
Each server is running baremetal with the following specs:
- 2x Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
- 64GB RAM
- SSD storage
How can we speed up our reloads?
Updated by Vincent Caron 14 days ago
This problem has been bugging me a lot too. I have lots of interfaces (250 VLANs) and about 200 NAT rules, reloading took 25min ... I could fix it down to 17s.
Looking at the filter reloading log I was obviusly hit by the "Creating reflection NAT rule" slowness, the toll was about 12s by item in my case !
I chased down the problem to 2 places, and reflection rule generation went from 12s to 25ms.
The analysis is the following :- get_interface_ip() is invoked 2x per NAT reflection rule, and each invocation takes ~6.2s in my case
- in get_interface_ip() most of the (wallclock) time is in the path get_failover_interface($if) -> return_gateway_groups_array(true)
- in return_gateway_groups_array() I found that the time spent was dispatched like this :
$gateways_status = return_gateways_status(true); # 3.8s ... $gateways_arr = return_gateways_array(); # 0.8s ... $gw4 = lookup_gateway_or_group_by_name($config['gateways']['defaultgw4'], $gways); # 0.8s $gw6 = lookup_gateway_or_group_by_name($config['gateways']['defaultgw6'], $gways); # 0.8s
For the return_gateways_status() path : invoked 2x by reflection rule, and does a ping-test on gateways via the dpinger service. AFAIK pfsense should make its mind once if a gateway is working or not, before building the filter rules. I hence used a simple memoization pattern :
--- /etc/inc/gwlb.inc.orig 2021-02-08 13:44:12.000000000 +0100 +++ /etc/inc/gwlb.inc 2025-05-20 17:51:02.586402000 +0200 @@ -437,8 +437,13 @@ /* return the status of the dpinger targets as an array */ function return_gateways_status($byname = false, $gways = false) { - global $config, $g; + global $config, $g, $gateways_status_memoized; + $memo_key = ($byname ? "byname" : "").":".($gways ? "gways" : ""); + if (isset($gateways_status_memoized[$memo_key])) { + return ($gateways_status_memoized[$memo_key]); + } + $dpinger_gws = running_dpinger_processes(); $status = array(); @@ -530,6 +535,7 @@ $status[$target]['monitor_disable'] = true; } + $gateways_status_memoized[$memo_key] = $status; return($status); }
Then it turns out that lookup_gateway_or_group_by_name() calls return_gateways_array(), thus we're only left with 3 calls to return_gateways_array() to optimize. It follows the chain : 2x calls to route_get_default() -> route_get() -> route_table() :
function route_table() { $_gb = exec("/usr/bin/netstat --libxo json -nWr", $rawdata, $rc);
On my system this command takes 0.4s, it's invoked 6 times per NAT reflection rule, we have our 2.4s budget to optimize. Same remark as above : I don't think it makes sense that the OS routing table would change while we're reloading the filter, thus I memoized it :
--- /etc/inc/util.inc.orig 2021-02-08 13:44:12.000000000 +0100 +++ /etc/inc/util.inc 2025-05-20 17:37:17.941698000 +0200 @@ -2623,6 +2623,9 @@ /* Return system's route table */ function route_table() { + global $route_table_memoized; + if (isset($route_table_memoized)) return $route_table_memoized; + $_gb = exec("/usr/bin/netstat --libxo json -nWr", $rawdata, $rc); if ($rc != 0) { @@ -2645,6 +2648,7 @@ } unset($netstatarr); + $route_table_memoized = $result; return $result; }