Bug #10513


State issues with policy routing and HA failover

Added by Jim Pingle about 4 years ago. Updated over 2 years ago.

Rules / NAT
Target version:
Start date:
Due date:
% Done:


Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
Affected Architecture:


Seeing some odd behavior on HA pairs which have multiple WANs and use policy routing. In some cases, the states for a client disappear when failing over. In others, the state is present but the traffic may be egressing the wrong interface.

Consider this scenario:

WAN1 is default, some clients policy routed out WAN2. In this example,

Start a TCP connection from to an Internet host. States on both and packet capture on primary show the traffic entering LAN, exiting WAN2 (OK)

Put the primary node into CARP maintenance mode. State is OK on primary. The state, which was there moments ago, is no longer in the state table on the secondary. Traffic from the client stops entirely.

Take the primary node out of CARP maintenance mode. States and packet capture on primary still show the traffic entering LAN, exiting WAN2 (OK).

Wait a bit and the state eventually re-syncs to the secondary node.

Now put the primary node back into CARP maintenance mode again. States on the secondary still show the traffic entering LAN, exiting WAN2 (OK) but the packet capture shows the packets actually leaving WAN1, with the address of WAN2 on the packets.

Note that if this is tested with ICMP, the second step will be different, as ICMP will result in a new state created to replace the missing state. That case appears to show the problem on the first fail back instead of taking a second turn.

Tested on 2.5.0.a.20200430.0741 (12.1-STABLE) but we have a report from a customer who is seeing this happen on 2.4.5-RELEASE

Actions #1

Updated by Anonymous almost 4 years ago

  • Assignee set to Renato Botelho
Actions #2

Updated by Anonymous over 3 years ago

  • Target version changed from 2.5.0 to CE-Next
Actions #3

Updated by Jose Duarte over 2 years ago

Tested in 2.5.2. This seems to still be a big issue.
pfSync is basically useless on a Multi-WAN setup, all states from WANs, which are not the default gateway, will be killed on failover.

I'm happy to help with testing if you have any suggestions on how to fix it

Actions #4

Updated by Viktor Gurov over 2 years ago

#8100 - maybe related

Actions #5

Updated by Christian Ullrich over 2 years ago

Tested in 2.5.2. This seems to still be a big issue.

In 2.6.0, too. I'm not sure about the lost states, but the traffic going out the wrong WAN is definitely still there. See also, but that is two pages of what fits in one sentence above.

Actions #6

Updated by Renato Botelho over 2 years ago

  • Assignee deleted (Renato Botelho)

Also available in: Atom PDF