Project

General

Profile

Regression #12660

Updated by Jim Pingle over 2 years ago

It's not clear why, but sometimes pf is putting the interface address as the gateway on a state instead of the actual next hop gateway. When this happens, the packet gets stuck in a loop in pf. The packet counters rapidly increase, and CPU usage on netisr queues hits 100%. The more states this happens to, the more queues end up at 100% and the larger the load on the firewall. Eventually, the firewall locks up, but that could be due to heat from most/all the cores running at 100%. 

 <pre> 
   PID USERNAME      PRI NICE     SIZE      RES STATE      C     TIME      WCPU COMMAND 
    12 root          -72      -       0B     576K CPU5       5    18:49    99.69% [intr{swi1: netisr 2}] 
    12 root          -72      -       0B     576K CPU6       6    38:01    99.66% [intr{swi1: netisr 5}] 
    12 root          -72      -       0B     576K CPU3       3    37:51    99.65% [intr{swi1: netisr 0}] 
    12 root          -72      -       0B     576K CPU4       4    32:17    99.65% [intr{swi1: netisr 1}] 
    12 root          -72      -       0B     576K CPU2       2    38:03    99.59% [intr{swi1: netisr 6}] 
    11 root          155 ki31       0B     128K RUN        7    14:14    99.20% [idle{idle: cpu7}] 
    12 root          -72      -       0B     576K CPU0       0    34:25    98.88% [intr{swi1: netisr 4}] 
    11 root          155 ki31       0B     128K CPU1       1    18:11    92.38% [idle{idle: cpu1}] 
 </pre> 

 In most cases the states are fine, but every once in a while, a state ends up with the wrong gateway without any clear apparent cause. 

 The following sets of states were all present in the state table at the same time: 

 <pre> 
 all tcp 142.250.190.14:443 <- a.a.a.58:39980         CLOSED:SYN_SENT 
    [0 + 4294901760]    [2831073638 + 16777216] 
    age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 268 
    id: d9cbd66100000000 creatorid: 6a679c92 gateway: w.w.w.2 
    origif: igb0 
 all tcp a.a.a.58:39980 -> 142.250.190.14:443         SYN_SENT:CLOSED 
    [2831073638 + 16777216]    [0 + 4294901760] 
    age 00:01:20, expires in 00:00:30, 11979109:0 pkts, 718746540:0 bytes, rule 164 
    id: dacbd66100000000 creatorid: 6a679c92 gateway: 0.0.0.0 
    origif: igb5 
 </pre> 

 As you can see in the above state, it's only the first TCP SYN packet of a connection, and no response. The packet counter and byte counter are exceptionally large. The gateway is @w.w.w.2@ which is actually the interface address of the WAN the packet should exit, @igb5@. 

 When making a packet capture on the client and on the firewall interfaces these packets *are not* actually on the wire. Thus, they appear to be looping internally in pf. 

 This next state pair is for the exact same source and destination but the connection completed successfully. Note that it has the correct gateway on the first state, @w.w.w.1@. 

 <pre> 
 all tcp 142.250.190.14:443 <- a.a.a.58:39978         FIN_WAIT_2:FIN_WAIT_2 
    [1673491087 + 4287496704] wscale 8    [3482141425 + 327936] wscale 8 
    age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 268 
    id: 1bf3d66100000001 creatorid: 6a679c92 gateway: w.w.w.1 
    origif: igb0 
 all tcp a.a.a.58:39978 -> 142.250.190.14:443         FIN_WAIT_2:FIN_WAIT_2 
    [3482141425 + 327936] wscale 8    [1673491087 + 4287496704] wscale 8 
    age 00:01:20, expires in 00:00:11, 12:10 pkts, 1237:7154 bytes, rule 164 
    id: 1cf3d66100000001 creatorid: 6a679c92 gateway: 0.0.0.0 
    origif: igb5 
 </pre> 

 There was no filter reload between when those states were created, as they show in the output they were created at almost the exact same time. 

 The rule which created those states is set to policy route that connection out via @w.w.w.1@, which should be the only gateway for states from the rule: 

 <pre> 
 @268(0) pass in quick on igb0 route-to { (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1), (igb5 w.w.w.1) } round-robin inet from a.a.a.0/24 to any flags S/SA keep state label "USER_RULE: Default LAN to any" ridentifier 1418235528 
   [ Evaluations: 8399        Packets: 1718677448    Bytes: 2975065697    States: 57      ] 
   [ Inserted: pid 53819 State Creations: 232     ] 
 </pre> 

 The gateway was repeated so many times due to the gateway weight setting (it was set to 10). 

 There is only one IP address on the @igb5@ interface, @w.w.w.2@: 

 <pre> 
 igb5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 
	 options=8120b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER> 
	 ether 00:xx:xx:xx:xx:b4 
	 inet6 fe80::208:a2ff:fe09:95b4%igb5 prefixlen 64 scopeid 0x6 
	 inet w.w.w.2 netmask 0xffffff00 broadcast w.w.w.255 
	 media: Ethernet autoselect (1000baseT <full-duplex>) 
	 status: active 
	 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> 
 </pre> 

 The upstream gateway on the @igb5@ interface is @w.w.w.1@ and it is also the default route when this happens: 
 <pre> 

 1:20 PM 
 default              w.w.w.1          UGS          igb5 
 [...] 
 w.w.w.2          link#6               UHS           lo0 
 </pre> 

 This exact same configuration was running fine on the @22.01.a.20211128.0600@ snapshot, but failed when upgraded to @22.01.b.20220103.0600@ 

 I cannot reproduce this reliably, unfortunately. It has occurred at boot time on three separate occasions so far, and if I reset the states it stabilizes. After that, it happens randomly at unpredictable times on different interfaces. If I monitor the system closely I can catch single states doing it (as above) and kill just the specific problem states before they get too out of control. 

 Also worth noting, in this setup, that interface does not need outbound NAT so it does not perform outbound NAT, as NAT happens on the next hop. If I disable the gateways on that WAN and let another WAN take over (that does NAT), the problem does not occur on the other WAN that I've seen so far. It's possible I didn't wait long enough (~2hrs) but I'll keep experimenting. 

Back