Project

General

Profile

Actions

Feature #855

closed

Ability to selectively kill states on gateway recovery

Added by Chris Buechler over 14 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Multi-WAN
Target version:
Start date:
08/27/2010
Due date:
% Done:

90%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default

Description

The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.


Files

gateway_recover.png (128 KB) gateway_recover.png Henniee Walterson, 01/17/2024 05:00 PM

Related issues

Related to Feature #12807: Clear Active Secondary WAN ConnectionsDuplicate

Actions
Related to Feature #12092: Utilize new ``pfctl`` abilities to kill statesClosedJim Pingle06/29/2021

Actions
Related to Todo #15220: Handle ``route-to`` and ``reply-to`` states when using the ``if-bound`` state policyResolvedKristof Provost

Actions
Has duplicate Feature #14533: Kil UDP states on gateway recoveryDuplicate

Actions
Actions #1

Updated by xavier Lemaire over 8 years ago

Chris Buechler wrote:

The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.

Hi Chris,

As i am crazy i am testing this change in /etc/rc.gateway_alarm :
It s not very clean but i hope it s going to do the job.

GW="$1"

if [ -z "$GW" ]; then
exit 1
fi
if [ "$3" = 0 ]; then
for i in $(ps -aux | grep dpinger | grep -v grep | grep -v "$1" | awk '{print $18}');
do
/sbin/pfctl -k "${i}";
done
fi

/usr/local/sbin/pfSctl \
-c "service reload dyndns ${GW}" \
-c "service reload ipsecdns" \
-c "service reload openvpn ${GW}" \
-c "filter reload" >/dev/null 2>&1

exit $?

Actions #2

Updated by Julien REVERT over 8 years ago

Is it still plan to have "states killing" on gateway failback?

I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.

The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.

How to fix this issue before having an option like "flush states on gateway back"?

Thanks.

Actions #3

Updated by James M over 8 years ago

Julien REVERT wrote:

Is it still plan to have "states killing" on gateway failback?

I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.

The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.

How to fix this issue before having an option like "flush states on gateway back"?

Thanks.

I agree with Julien, something like this is needed for state failback after a connection is down.

Actions #4

Updated by → luckman212 over 8 years ago

This would be especially useful for VOIP, where there are often frequent registrations or other SIP traffic that keeps the states locked to the failover WAN even after the primary has come back online. This results in excess usage charges and also poor quality calls where e.g. the failover line is a 4G metered connection. So I would love to see this as well.

I just noticed that this feature request is 6 years old. :/

Actions #5

Updated by Travis McMurry over 7 years ago

As echoed by others, I'm seeing the same thing for VOIP and other devices which auto negotiate VPN tunnels which maintain constant connectivity - Femtocells/Microcells, Meraki branded equipment...

It's also a cost concern as the failover options I use tend to be OOB/4G/LTE, if devices in a failover situation stay connected to a metered connection, that does incur extra cost for unnecessarily consumed bandwidth.

As of 8/3/2017, it's now a 7 year old feature request. Nudge. :-)

Actions #6

Updated by Jim Pingle about 6 years ago

  • Target version changed from Future to 48

See also: #7605

Actions #7

Updated by Andrew Bucklin almost 6 years ago

+1 I'm surprised this isn't already a feature. I noticed this today when we our primary connection came back online, but our off-site data backups (which traverse a OpenVPN client connection) were still going over the secondary WAN link, which is 500x slower than the primary WAN. Thank you!

Actions #8

Updated by Jim Pingle almost 6 years ago

  • Target version changed from 48 to 2.5.0
Actions #9

Updated by Marc H over 4 years ago

+1 - this is a badly needed feature with multi WAN where one connection is truly a "backup only" connection. High cost metered LTE, etc... We need an option to force states to fail back to the primary WAN when it is available. Thanks.

Actions #10

Updated by Raffi T over 4 years ago

+1 I haven't really been hurt by this until recently while performing a big backup job to the cloud. Failover occurred briefly but there was still a significant amount of data usage on the metered 4G backup connection well after the event. I had to disable the gateway monitoring action while performing this backup. It says this was requested 10 years ago? Ouch, not enough people requesting it?

Actions #11

Updated by Anonymous about 4 years ago

  • Assignee set to Renato Botelho
Actions #12

Updated by Anonymous about 4 years ago

  • Target version changed from 2.5.0 to CE-Next
Actions #13

Updated by aptalca aptalca almost 4 years ago

I just hit this issue with a failover LTE connection (metered).

I have almost everything go out over a wireguard tunnel on 2.5.0.

After a main WAN connection loss, everything successfully switched over to the failover LTE gateway. However, after the main WAN came back online and once again became the default gateway, the wireguard tunnel remained going over the backup LTE gateway indefinitely (until I manually intervened).

Hopefully 10 years is a charm? :-)

Thanks

Actions #14

Updated by Viktor Gurov almost 3 years ago

  • Related to Feature #12807: Clear Active Secondary WAN Connections added
Actions #15

Updated by Jim Pingle almost 3 years ago

  • Related to Feature #12092: Utilize new ``pfctl`` abilities to kill states added
Actions #16

Updated by Jim Pingle almost 3 years ago

  • Subject changed from More flexible options for state killing based on WAN status to Ability to selectively kill states on lower tier gateways when higher tier recovers
  • Assignee deleted (Renato Botelho)
  • Plus Target Version set to Plus-Next

Updating subject. Many scenarios are now possible with #12092 and also some more will be covered by #12931 so this can be reduced in scope to the single scenario of killing states for lower tier gateways.

Thanks to recent changes in pfctl this is closer to reality. There is now an ability to kill by the gateway information in a state (pfctl -k gateway -k x.x.x.x). This can be leveraged here to make it much easier to clear these states.

Now there are only a few items to take care of:

  • Needs a new option to selectively activate this feature somewhere, either globally or on a gateway group.
  • This only makes logical sense to activate on a single gateway group. There isn't a way to only kill states from a gateway address while also restricting that to states created by a specific gateway group.
    • If the option is global, it should include a way to select the failover group for which it applies.
    • If the option is on a gateway group, it should only be possible to activate it on a single group, similar to changing the default gateway. Either deactivate the other or throw an input error saying it can't be enabled on more than one per address family.
  • Must note that it only works for states matching policy routing rules, as it will not work for traffic matching rules that rely on default gateway switching directly. Those states have 0.0.0.0/:: in their data.
  • The actual action will have to be careful to ONLY activate when a gateway recovers from down to up state, not on every filter reload or page load that notices a gateway is up. The gateway alarm script may be the optimal place to handle this, but needs testing to ensure it doesn't activate too often.
  • Similar to #12092 it might be nice to have a new per-gateway option to override this behavior
Actions #17

Updated by Jim Pingle almost 3 years ago

  • Subject changed from Ability to selectively kill states on lower tier gateways when higher tier recovers to Ability to selectively kill states on gateways recovery
Actions #18

Updated by MICHAEL MAST over 2 years ago

Wanted to put more support for this feature. I have 11 netgate appliances deployed and enterprise support on a few, with more on the way with business expansion, and we have a problem with Mitel phones and one way audio during gateway recovery.

On the old Ubiquiti Edgerouters we used the option to clear the state tables during any event, failover and recovery was included. Cisco routers have never given us a problem.

Thank you for working on this feature, I look forward to seeing it deployed.

Actions #19

Updated by Chris B almost 2 years ago

6 month addition!

As above, using LTE/5G for backup means the data is very expensive. A brief failover will mean things like Netflix and my work PC's VPN will fail over and never fail back, potentially being very costly.

Actions #20

Updated by Alex Viper_Rus over 1 year ago

A very necessary feature for those who use the second WAN exclusively as a backup channel, and especially if it has very expensive traffic.
It is very disappointing that this function has a low priority and has not yet been implemented.

Actions #21

Updated by Jim Pingle over 1 year ago

  • Has duplicate Feature #14533: Kil UDP states on gateway recovery added
Actions #22

Updated by Craig Sharp 12 months ago

This is a very frustrating issue. I do not understand where the issue is at since on a failure, the states are down and the secondary wan picks up. Why is it so difficult to do the same on failback? We should not have to run a command line job to kill the states.

Commercial firewalls do this all the time. 13 years to fix an issue is not promising. Please advise on the status of the fix.

Thanks.

Actions #23

Updated by Henniee Walterson 11 months ago

+1!
Same problem with multi-path routing and multi-wan!
Seems to be easy to solve with pfctl.
Pls do it. Soon.

Actions #24

Updated by Marcos M 11 months ago

  • Status changed from New to Assigned
  • Assignee set to Marcos M
Actions #25

Updated by Henniee Walterson 11 months ago

would be a charm like this...

love my paint :-)

Actions #26

Updated by Henniee Walterson 11 months ago

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

Actions #27

Updated by Alex Viper_Rus 11 months ago

it would be useful if vpn connections were also reconnected via the restored gateway

Actions #28

Updated by Marcos M 11 months ago

  • Status changed from Assigned to In Progress
  • % Done changed from 0 to 50
Actions #30

Updated by Marcos M 11 months ago

  • Subject changed from Ability to selectively kill states on gateways recovery to Ability to selectively kill states on gateway recovery
Actions #31

Updated by Marcos M 11 months ago

  • Target version changed from CE-Next to 2.8.0
  • Plus Target Version changed from Plus-Next to 24.03
Actions #32

Updated by Alex Viper_Rus 11 months ago

Henniee Walterson wrote in #note-26:

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

As far as I understand, this should be implemented not in the settings of the gateways themselves, but in groups of gateways. After all, pfsense must understand which gateways to kill states.

Actions #33

Updated by Marcos M 11 months ago

  • Status changed from Pull Request Review to Feedback
  • % Done changed from 50 to 100
Actions #34

Updated by Marcos M 11 months ago

  • Status changed from Feedback to Needs Patch
  • % Done changed from 100 to 90
Actions #35

Updated by Marcos M 11 months ago

  • Related to Todo #15220: Handle ``route-to`` and ``reply-to`` states when using the ``if-bound`` state policy added
Actions #36

Updated by Marcos M 11 months ago

  • Status changed from Needs Patch to Resolved

This has been working well in 24.03 snapshots. Documentation is available at:
https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#state-killing-on-gateway-recovery

Actions #37

Updated by Asher Oto 5 months ago

Henniee Walterson wrote in #note-26:

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

This is an excellent idea and pfSense definitely needs it.

If you're using a cellular connection for failover and pfSense fails to clear the states upon recovery, you may incur significant charges for unnecessary traffic that should have switched back to the main gateway, or your account could be throttled.

Someone on Reddit posted about this very thing...
https://www.reddit.com/r/PFSENSE/comments/i3w636/unreliable_wan_failover_switching_notifications/

Actions

Also available in: Atom PDF