Project

General

Profile

Actions

Bug #8555

closed

Selectively killing states on WAN failure

Added by Steven Brown over 6 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Multi-WAN
Target version:
-
Start date:
06/06/2018
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
All
Affected Architecture:
All

Description

The current options on a WAN failure is to kill all states, or none at all. In a scenario such as having a wireless link is installed as a backup, this leaves all your connections being dropped if the wireless backup link goes offline or not dropping connection states and having devices that don't fail over to the backup link properly if your main link goes offline. With something like VOIP this can result in dropped calls when the backup connection fails or phones going dead and not failing over if the main link fails.

Killing states was looked at in Bug #3181, and there is a comment "Wiping the entire state table is overkill, but will have to suffice for 2.1", but the code doesn't look to have been changed since then.

There is code in /etc/rc.kill_states that attempts to selectively kill states based on the states found on a failed interface. I have taken this, modified it and added it in to /etc/inc/filter.inc to try to handle these situations so connections will fail over to a backup gateway without the need to kill all active states on non-failed gateways.

I have attached two patches. One takes the code from /etc/rc.kill_states and only kills the connections based on IPs which match associated NAT states, along with all connections on the interface. The other expands this code and finds and kills all connections based on IPs which match any connection state on that interface, NAT or not, IPv4 and IPv6.

There is a situation where if certain IP pairs have connections out two different gateways, for example if different connections from the same source to the same destination were routed out two different gateways, it will drop the connections which were on going through the non-failed gateway as well, but this is still less of an impact compared to killing all states in the state table.

Possible improvements to these patches:
  • Moving this code into its own function if the logic can be shared by these two areas.
  • Fix the code path such that routing fails to a backup gateway before the states are killed. The code to kill states seems to be called multiple times (some in different threads) on gateway failover. I've noted that after the first call to kill states, connection attempts directly after this may still attempt to go out the failed gateway. Further calls to kill states happen subsequently and the connections will eventually fail over, but this seems to take extra time than may be necessary.

On a side note, I also discovered that the original code in /etc/rc.kill_states has a bug preventing it from working as expected - Bug #8554


Files

patch.nat_only (1.32 KB) patch.nat_only Kills only NAT connections on other interfaces associated with the failed interface Steven Brown, 06/06/2018 08:41 PM
patch.all_connections (2.26 KB) patch.all_connections Kills all connections on other interfaces associated with the failed interface Steven Brown, 06/06/2018 08:41 PM
FreeBSD-src.patch (12.2 KB) FreeBSD-src.patch Steven Brown, 07/15/2018 09:44 PM
pfsense.patch (1.46 KB) pfsense.patch Steven Brown, 07/15/2018 09:44 PM

Related issues

Related to Feature #12092: Utilize new ``pfctl`` abilities to kill statesClosedJim Pingle06/29/2021

Actions
Actions

Also available in: Atom PDF