Regression #12660
Updated by Jim Pingle almost 3 years ago
It's not clear why, but sometimes pf is putting the interface address as the gateway on a state instead of the actual next hop gateway. When this happens, the packet gets stuck in a loop in pf. The packet counters rapidly increase, and CPU usage on netisr queues hits 100%. The more states this happens to, the more queues end up at 100% and the larger the load on the firewall. Eventually, the firewall locks up, but that could be due to heat from most/all the cores running at 100%.
<pre>
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root -72 - 0B 576K CPU5 5 18:49 99.69% [intr{swi1: netisr 2}]
12 root -72 - 0B 576K CPU6 6 38:01 99.66% [intr{swi1: netisr 5}]
12 root -72 - 0B 576K CPU3 3 37:51 99.65% [intr{swi1: netisr 0}]
12 root -72 - 0B 576K CPU4 4 32:17 99.65% [intr{swi1: netisr 1}]
12 root -72 - 0B 576K CPU2 2 38:03 99.59% [intr{swi1: netisr 6}]
11 root 155 ki31 0B 128K RUN 7 14:14 99.20% [idle{idle: cpu7}]
12 root -72 - 0B 576K CPU0 0 34:25 98.88% [intr{swi1: netisr 4}]
11 root 155 ki31 0B 128K CPU1 1 18:11 92.38% [idle{idle: cpu1}]
</pre>
In most cases the states are fine, but every once in a while, a state ends up with the wrong gateway without any clear apparent cause.
The following sets of states were all present in the state table at the same time:
<pre>
all tcp 142.250.190.14:443 <- a.a.a.58:39980 CLOSED:SYN_SENT
[0 + 4294901760] [2831073638 + 16777216]
age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 268
id: d9cbd66100000000 creatorid: 6a679c92 gateway: w.w.w.2
origif: igb0
all tcp a.a.a.58:39980 -> 142.250.190.14:443 SYN_SENT:CLOSED
[2831073638 + 16777216] [0 + 4294901760]
age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 164
id: dacbd66100000000 creatorid: 6a679c92 gateway: 0.0.0.0
origif: igb5
</pre>
As you can see in the above state, it's only the first TCP SYN packet of a connection, and no response. The packet counter and byte counter are exceptionally large. The gateway is @w.w.w.2@ which is actually the interface address of the WAN the packet should exit, @igb5@.
When making a packet capture on the client and on the firewall interfaces these packets *are not* actually on the wire. Thus, they appear to be looping internally in pf.
This next state pair is for the exact same source and destination but the connection completed successfully. Note that it has the correct gateway on the first state, @w.w.w.1@.
<pre>
all tcp 142.250.190.14:443 <- a.a.a.58:39978 FIN_WAIT_2:FIN_WAIT_2
[1673491087 + 4287496704] wscale 8 [3482141425 + 327936] wscale 8
age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 268
id: 1bf3d66100000001 creatorid: 6a679c92 gateway: w.w.w.1
origif: igb0
all tcp a.a.a.58:39978 -> 142.250.190.14:443 FIN_WAIT_2:FIN_WAIT_2
[3482141425 + 327936] wscale 8 [1673491087 + 4287496704] wscale 8
age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 164
id: 1cf3d66100000001 creatorid: 6a679c92 gateway: 0.0.0.0
origif: igb5
</pre>
There was no filter reload between when those states were created, as they show in the output they were created at almost the exact same time.
The rule which created those states is set to policy route that connection out via @w.w.w.1@, which should be the only gateway for states from the rule:
<pre>
@268(0) pass in quick on igb0 route-to { (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1) } round-robin inet from a.a.a.0/24 to any flags S/SA keep state label "USER_RULE: Default LAN to any" ridentifier 1418235528
[ Evaluations: 8399 Packets: 1718677448 Bytes: 2975065697 States: 57 ]
[ Inserted: pid 53819 State Creations: 232 ]
</pre>
The gateway was repeated so many times due to the gateway weight setting (it was set to 10).
There is only one IP address on the @igb5@ interface, @w.w.w.2@:
<pre>
igb5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8120b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER>
ether 00:xx:xx:xx:xx:b4
inet6 fe80::208:a2ff:fe09:95b4%igb5 prefixlen 64 scopeid 0x6
inet w.w.w.2 netmask 0xffffff00 broadcast w.w.w.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
</pre>
The upstream gateway on the @igb5@ interface is @w.w.w.1@ and it is also the default route when this happens:
<pre>
1:20 PM
default w.w.w.1 UGS igb5
[...]
w.w.w.2 link#6 UHS lo0
</pre>
This exact same configuration was running fine on the @22.01.a.20211128.0600@ snapshot, but failed when upgraded to @22.01.b.20220103.0600@
I cannot reproduce this reliably, unfortunately. It has occurred at boot time on three separate occasions so far, and if I reset the states it stabilizes. After that, it happens randomly at unpredictable times on different interfaces. If I monitor the system closely I can catch single states doing it (as above) and kill just the specific problem states before they get too out of control.
Also worth noting, in this setup, that interface does not need outbound NAT so it does not perform outbound NAT, as NAT happens on the next hop. If I disable the gateways on that WAN and let another WAN take over (that does NAT), the problem does not occur on the other WAN that I've seen so far. It's possible I didn't wait long enough (~2hrs) but I'll keep experimenting.