Regression #12660


High CPU usage due to incorrect gateway on some policy routed states

Added by Jim Pingle 5 months ago. Updated 4 months ago.

Rules / NAT
Target version:
Start date:
Due date:
% Done:


Estimated time:
Plus Target Version:
Release Notes:
Force Exclusion
Affected Version:
Affected Architecture:


It's not clear why, but sometimes pf is putting the interface address as the gateway on a state instead of the actual next hop gateway. When this happens, the packet gets stuck in a loop in pf. The packet counters rapidly increase, and CPU usage on netisr queues hits 100%. The more states this happens to, the more queues end up at 100% and the larger the load on the firewall. Eventually, the firewall locks up, but that could be due to heat from most/all the cores running at 100%.

   12 root        -72    -     0B   576K CPU5     5  18:49  99.69% [intr{swi1: netisr 2}]
   12 root        -72    -     0B   576K CPU6     6  38:01  99.66% [intr{swi1: netisr 5}]
   12 root        -72    -     0B   576K CPU3     3  37:51  99.65% [intr{swi1: netisr 0}]
   12 root        -72    -     0B   576K CPU4     4  32:17  99.65% [intr{swi1: netisr 1}]
   12 root        -72    -     0B   576K CPU2     2  38:03  99.59% [intr{swi1: netisr 6}]
   11 root        155 ki31     0B   128K RUN      7  14:14  99.20% [idle{idle: cpu7}]
   12 root        -72    -     0B   576K CPU0     0  34:25  98.88% [intr{swi1: netisr 4}]
   11 root        155 ki31     0B   128K CPU1     1  18:11  92.38% [idle{idle: cpu1}]

In most cases the states are fine, but every once in a while, a state ends up with the wrong gateway without any clear apparent cause.

The following sets of states were all present in the state table at the same time:

all tcp <- a.a.a.58:39980       CLOSED:SYN_SENT
   [0 + 4294901760]  [2831073638 + 16777216]
   age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 268
   id: d9cbd66100000000 creatorid: 6a679c92 gateway: w.w.w.2
   origif: igb0
all tcp a.a.a.58:39980 ->       SYN_SENT:CLOSED
   [2831073638 + 16777216]  [0 + 4294901760]
   age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 164
   id: dacbd66100000000 creatorid: 6a679c92 gateway:
   origif: igb5

As you can see in the above state, it's only the first TCP SYN packet of a connection, and no response. The packet counter and byte counter are exceptionally large. The gateway is w.w.w.2 which is actually the interface address of the WAN the packet should exit, igb5.

When making a packet capture on the client and on the firewall interfaces these packets are not actually on the wire. Thus, they appear to be looping internally in pf.

This next state pair is for the exact same source and destination but the connection completed successfully. Note that it has the correct gateway on the first state, w.w.w.1.

all tcp <- a.a.a.58:39978       FIN_WAIT_2:FIN_WAIT_2
   [1673491087 + 4287496704] wscale 8  [3482141425 + 327936] wscale 8
   age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 268
   id: 1bf3d66100000001 creatorid: 6a679c92 gateway: w.w.w.1
   origif: igb0
all tcp a.a.a.58:39978 ->       FIN_WAIT_2:FIN_WAIT_2
   [3482141425 + 327936] wscale 8  [1673491087 + 4287496704] wscale 8
   age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 164
   id: 1cf3d66100000001 creatorid: 6a679c92 gateway:
   origif: igb5

There was no filter reload between when those states were created, as they show in the output they were created at almost the exact same time.

The rule which created those states is set to policy route that connection out via w.w.w.1, which should be the only gateway for states from the rule:

@268(0) pass in quick on igb0 route-to { (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1) } round-robin inet from a.a.a.0/24 to any flags S/SA keep state label "USER_RULE: Default LAN to any" ridentifier 1418235528
  [ Evaluations: 8399      Packets: 1718677448  Bytes: 2975065697  States: 57    ]
  [ Inserted: pid 53819 State Creations: 232   ]

The gateway was repeated so many times due to the gateway weight setting (it was set to 10).

There is only one IP address on the igb5 interface, w.w.w.2:

igb5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:xx:xx:xx:xx:b4
    inet6 fe80::208:a2ff:fe09:95b4%igb5 prefixlen 64 scopeid 0x6
    inet w.w.w.2 netmask 0xffffff00 broadcast w.w.w.255
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active

The upstream gateway on the igb5 interface is w.w.w.1 and it is also the default route when this happens:

1:20 PM
default            w.w.w.1        UGS        igb5
w.w.w.2        link#6             UHS         lo0

This exact same configuration was running fine on the 22.01.a.20211128.0600 snapshot, but failed when upgraded to 22.01.b.20220103.0600

I cannot reproduce this reliably, unfortunately. It has occurred at boot time on three separate occasions so far, and if I reset the states it stabilizes. After that, it happens randomly at unpredictable times on different interfaces. If I monitor the system closely I can catch single states doing it (as above) and kill just the specific problem states before they get too out of control.

Also worth noting, in this setup, that interface does not need outbound NAT so it does not perform outbound NAT, as NAT happens on the next hop. If I disable the gateways on that WAN and let another WAN take over (that does NAT), the problem does not occur on the other WAN that I've seen so far. It's possible I didn't wait long enough (~2hrs) but I'll keep experimenting.



Also available in: Atom PDF