State issues with policy routing and HA failover
Seeing some odd behavior on HA pairs which have multiple WANs and use policy routing. In some cases, the states for a client disappear when failing over. In others, the state is present but the traffic may be egressing the wrong interface.
Consider this scenario:
WAN1 is default, some clients policy routed out WAN2. In this example, 10.11.0.12.
Start a TCP connection from 10.11.0.12 to an Internet host. States on both and packet capture on primary show the traffic entering LAN, exiting WAN2 (OK)
Put the primary node into CARP maintenance mode. State is OK on primary. The state, which was there moments ago, is no longer in the state table on the secondary. Traffic from the client stops entirely.
Take the primary node out of CARP maintenance mode. States and packet capture on primary still show the traffic entering LAN, exiting WAN2 (OK).
Wait a bit and the state eventually re-syncs to the secondary node.
Now put the primary node back into CARP maintenance mode again. States on the secondary still show the traffic entering LAN, exiting WAN2 (OK) but the packet capture shows the packets actually leaving WAN1, with the address of WAN2 on the packets.
Note that if this is tested with ICMP, the second step will be different, as ICMP will result in a new state created to replace the missing state. That case appears to show the problem on the first fail back instead of taking a second turn.
Tested on 2.5.0.a.20200430.0741 (12.1-STABLE) but we have a report from a customer who is seeing this happen on 2.4.5-RELEASE