Feature #855: Ability to selectively kill states on gateway recovery - pfSense - pfSense bugtracker

Actions

Copy link

Feature #855

closed

Ability to selectively kill states on gateway recovery

Added by Chris Buechler over 15 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Marcos M

Category:

Multi-WAN

Target version:

2.8.0

Start date:

08/27/2010

Due date:

% Done:

90%

Estimated time:

Plus Target Version:

24.03

Release Notes:

Default

Description

The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.

Files

gateway_recover.png (128 KB) gateway_recover.png

Henniee Walterson, 01/17/2024 05:00 PM

Related issues

Actions

Copy link

Updated by xavier Lemaire almost 10 years ago

Chris Buechler wrote:

The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.

Hi Chris,

As i am crazy i am testing this change in /etc/rc.gateway_alarm :
It s not very clean but i hope it s going to do the job.

GW="$1"

if [ -z "$GW" ]; then
exit 1
fi
if [ "$3" = 0 ]; then
for i in $(ps -aux | grep dpinger | grep -v grep | grep -v "$1" | awk '{print $18}');
do
/sbin/pfctl -k "${i}";
done
fi

/usr/local/sbin/pfSctl \
-c "service reload dyndns ${GW}" \
-c "service reload ipsecdns" \
-c "service reload openvpn ${GW}" \
-c "filter reload" >/dev/null 2>&1

exit $?

Actions

Copy link

Updated by Julien REVERT over 9 years ago

Is it still plan to have "states killing" on gateway failback?

I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.

The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.

How to fix this issue before having an option like "flush states on gateway back"?

Thanks.

Actions

Copy link

Updated by James M over 9 years ago

Julien REVERT wrote:

Is it still plan to have "states killing" on gateway failback?

I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.

The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.

How to fix this issue before having an option like "flush states on gateway back"?

Thanks.

I agree with Julien, something like this is needed for state failback after a connection is down.

Actions

Copy link

Updated by → luckman212 over 9 years ago

This would be especially useful for VOIP, where there are often frequent registrations or other SIP traffic that keeps the states locked to the failover WAN even after the primary has come back online. This results in excess usage charges and also poor quality calls where e.g. the failover line is a 4G metered connection. So I would love to see this as well.

I just noticed that this feature request is 6 years old. :/

Actions

Copy link

Updated by Travis McMurry over 8 years ago

As echoed by others, I'm seeing the same thing for VOIP and other devices which auto negotiate VPN tunnels which maintain constant connectivity - Femtocells/Microcells, Meraki branded equipment...

It's also a cost concern as the failover options I use tend to be OOB/4G/LTE, if devices in a failover situation stay connected to a metered connection, that does incur extra cost for unnecessarily consumed bandwidth.

As of 8/3/2017, it's now a 7 year old feature request. Nudge. :-)

Actions

Copy link

Updated by Jim Pingle over 7 years ago

Target version changed from Future to 48

Updated by Andrew Bucklin about 7 years ago

+1 I'm surprised this isn't already a feature. I noticed this today when we our primary connection came back online, but our off-site data backups (which traverse a OpenVPN client connection) were still going over the secondary WAN link, which is 500x slower than the primary WAN. Thank you!

Actions

Copy link

Updated by Jim Pingle almost 7 years ago

Target version changed from 48 to 2.5.0

Actions

Copy link

Updated by Marc H over 5 years ago

+1 - this is a badly needed feature with multi WAN where one connection is truly a "backup only" connection. High cost metered LTE, etc... We need an option to force states to fail back to the primary WAN when it is available. Thanks.

Actions

Copy link

#10

Updated by Raffi T over 5 years ago

+1 I haven't really been hurt by this until recently while performing a big backup job to the cloud. Failover occurred briefly but there was still a significant amount of data usage on the metered 4G backup connection well after the event. I had to disable the gateway monitoring action while performing this backup. It says this was requested 10 years ago? Ouch, not enough people requesting it?

Actions

Copy link

#11

Updated by Anonymous over 5 years ago

Assignee set to Renato Botelho

Actions

Copy link

#12

Updated by Anonymous over 5 years ago

Target version changed from 2.5.0 to CE-Next

Actions

Copy link

#13

Updated by aptalca aptalca almost 5 years ago

I just hit this issue with a failover LTE connection (metered).

I have almost everything go out over a wireguard tunnel on 2.5.0.

After a main WAN connection loss, everything successfully switched over to the failover LTE gateway. However, after the main WAN came back online and once again became the default gateway, the wireguard tunnel remained going over the backup LTE gateway indefinitely (until I manually intervened).

Hopefully 10 years is a charm? :-)

Thanks

Actions

Copy link

#14

Updated by Viktor Gurov about 4 years ago

Related to Feature #12807: Clear Active Secondary WAN Connections added

Actions

Copy link

#15

Updated by Jim Pingle almost 4 years ago

Related to Feature #12092: Utilize new ``pfctl`` abilities to kill states added

Actions

Copy link

#16

Updated by Jim Pingle almost 4 years ago

Subject changed from More flexible options for state killing based on WAN status to Ability to selectively kill states on lower tier gateways when higher tier recovers
Assignee deleted (~~Renato Botelho~~)
Plus Target Version set to Plus-Next

Updating subject. Many scenarios are now possible with #12092 and also some more will be covered by #12931 so this can be reduced in scope to the single scenario of killing states for lower tier gateways.

Thanks to recent changes in pfctl this is closer to reality. There is now an ability to kill by the gateway information in a state (pfctl -k gateway -k x.x.x.x). This can be leveraged here to make it much easier to clear these states.

Now there are only a few items to take care of:

Needs a new option to selectively activate this feature somewhere, either globally or on a gateway group.
This only makes logical sense to activate on a single gateway group. There isn't a way to only kill states from a gateway address while also restricting that to states created by a specific gateway group.
- If the option is global, it should include a way to select the failover group for which it applies.
- If the option is on a gateway group, it should only be possible to activate it on a single group, similar to changing the default gateway. Either deactivate the other or throw an input error saying it can't be enabled on more than one per address family.

Must note that it only works for states matching policy routing rules, as it will not work for traffic matching rules that rely on default gateway switching directly. Those states have 0.0.0.0/:: in their data.
The actual action will have to be careful to ONLY activate when a gateway recovers from down to up state, not on every filter reload or page load that notices a gateway is up. The gateway alarm script may be the optimal place to handle this, but needs testing to ensure it doesn't activate too often.
Similar to #12092 it might be nice to have a new per-gateway option to override this behavior

Actions

Copy link

#17

Updated by Jim Pingle almost 4 years ago

Subject changed from Ability to selectively kill states on lower tier gateways when higher tier recovers to Ability to selectively kill states on gateways recovery

Actions

Copy link

#18

Updated by MICHAEL MAST over 3 years ago

Wanted to put more support for this feature. I have 11 netgate appliances deployed and enterprise support on a few, with more on the way with business expansion, and we have a problem with Mitel phones and one way audio during gateway recovery.

On the old Ubiquiti Edgerouters we used the option to clear the state tables during any event, failover and recovery was included. Cisco routers have never given us a problem.

Thank you for working on this feature, I look forward to seeing it deployed.

Actions

Copy link

#19

Updated by Chris B almost 3 years ago

6 month addition!

As above, using LTE/5G for backup means the data is very expensive. A brief failover will mean things like Netflix and my work PC's VPN will fail over and never fail back, potentially being very costly.

Actions

Copy link

#20

Updated by Alex Viper_Rus almost 3 years ago

A very necessary feature for those who use the second WAN exclusively as a backup channel, and especially if it has very expensive traffic.
It is very disappointing that this function has a low priority and has not yet been implemented.

Actions

Copy link

#21

Updated by Jim Pingle over 2 years ago

Has duplicate Feature #14533: Kil UDP states on gateway recovery added

Actions

Copy link

#22

Updated by Craig Sharp about 2 years ago

This is a very frustrating issue. I do not understand where the issue is at since on a failure, the states are down and the secondary wan picks up. Why is it so difficult to do the same on failback? We should not have to run a command line job to kill the states.

Commercial firewalls do this all the time. 13 years to fix an issue is not promising. Please advise on the status of the fix.

Thanks.

Actions

Copy link

#23

Updated by Henniee Walterson about 2 years ago

+1!
Same problem with multi-path routing and multi-wan!
Seems to be easy to solve with pfctl.
Pls do it. Soon.

Actions

Copy link

#24

Updated by Marcos M about 2 years ago

Status changed from New to Assigned
Assignee set to Marcos M

Actions

Copy link

#25

Updated by Henniee Walterson about 2 years ago

File gateway_recover.png gateway_recover.png added

would be a charm like this...

love my paint :-)

Actions

Copy link

#26

Updated by Henniee Walterson about 2 years ago

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

Actions

Copy link

#27

Updated by Alex Viper_Rus about 2 years ago

it would be useful if vpn connections were also reconnected via the restored gateway

Actions

Copy link

#28

Updated by Marcos M about 2 years ago

Status changed from Assigned to In Progress
% Done changed from 0 to 50

Actions

Copy link

#29

Updated by Marcos M about 2 years ago

Status changed from In Progress to Pull Request Review

https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1124

https://redmine.pfsense.org/issues/15208

Actions

Copy link

#30

Updated by Marcos M about 2 years ago

Subject changed from Ability to selectively kill states on gateways recovery to Ability to selectively kill states on gateway recovery

Actions

Copy link

#31

Updated by Marcos M about 2 years ago

Target version changed from CE-Next to 2.8.0
Plus Target Version changed from Plus-Next to 24.03

Actions

Copy link

#32

Updated by Alex Viper_Rus about 2 years ago

Henniee Walterson wrote in #note-26:

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

As far as I understand, this should be implemented not in the settings of the gateways themselves, but in groups of gateways. After all, pfsense must understand which gateways to kill states.

Actions

Copy link

#33

Updated by Marcos M about 2 years ago

Status changed from Pull Request Review to Feedback
% Done changed from 50 to 100

Applied in changeset 30d46b63834444e9a7a4af310a5d8aaf94baf01a.

Actions

Copy link

#34

Updated by Marcos M about 2 years ago

Status changed from Feedback to Needs Patch
% Done changed from 100 to 90

Actions

Copy link

#35

Updated by Marcos M about 2 years ago

Related to Todo #15220: Handle ``route-to`` and ``reply-to`` states when using the ``if-bound`` state policy added

Actions

Copy link

#36

Updated by Marcos M about 2 years ago

Status changed from Needs Patch to Resolved

This has been working well in 24.03 snapshots. Documentation is available at:
https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#state-killing-on-gateway-recovery

Actions

Copy link

#37

Updated by Asher Oto over 1 year ago

Henniee Walterson wrote in #note-26:

it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)

This is an excellent idea and pfSense definitely needs it.

If you're using a cellular connection for failover and pfSense fails to clear the states upon recovery, you may incur significant charges for unnecessary traffic that should have switched back to the main gateway, or your account could be throttled.

Someone on Reddit posted about this very thing...
https://www.reddit.com/r/PFSENSE/comments/i3w636/unreliable_wan_failover_switching_notifications/

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Feature #855

Ability to selectively kill states on gateway recovery

Updated by xavier Lemaire almost 10 years ago

Updated by Julien REVERT over 9 years ago

Updated by James M over 9 years ago

Updated by → luckman212 over 9 years ago

Updated by Travis McMurry over 8 years ago

Updated by Jim Pingle over 7 years ago

Updated by Andrew Bucklin about 7 years ago

Updated by Jim Pingle almost 7 years ago

Updated by Marc H over 5 years ago

Updated by Raffi T over 5 years ago

Updated by Anonymous over 5 years ago

Updated by Anonymous over 5 years ago

Updated by aptalca aptalca almost 5 years ago

Updated by Viktor Gurov about 4 years ago

Updated by Jim Pingle almost 4 years ago

Updated by Jim Pingle almost 4 years ago

Updated by Jim Pingle almost 4 years ago

Updated by MICHAEL MAST over 3 years ago

Updated by Chris B almost 3 years ago

Updated by Alex Viper_Rus almost 3 years ago

Updated by Jim Pingle over 2 years ago

Updated by Craig Sharp about 2 years ago

Updated by Henniee Walterson about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Henniee Walterson about 2 years ago

Updated by Henniee Walterson about 2 years ago

Updated by Alex Viper_Rus about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Alex Viper_Rus about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Marcos M about 2 years ago

Updated by Asher Oto over 1 year ago