Feature #855
closedAbility to selectively kill states on gateway recovery
90%
Description
The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.
Files
Related issues
Updated by xavier Lemaire over 8 years ago
Chris Buechler wrote:
The current practice of killing all states when a connection goes down on that downed connection is fine for the majority of scenarios, but some would like to see additional options. First, the ability to optionally kill states to fail back once the original connection recovers. I suspect there may be other desired scenarios as well, which can be added here as they're encountered.
Hi Chris,
As i am crazy i am testing this change in /etc/rc.gateway_alarm :
It s not very clean but i hope it s going to do the job.
GW="$1"
if [ -z "$GW" ]; then
exit 1
fi
if [ "$3" = 0 ]; then
for i in $(ps -aux | grep dpinger | grep -v grep | grep -v "$1" | awk '{print $18}');
do
/sbin/pfctl -k "${i}";
done
fi
/usr/local/sbin/pfSctl \
-c "service reload dyndns ${GW}" \
-c "service reload ipsecdns" \
-c "service reload openvpn ${GW}" \
-c "filter reload" >/dev/null 2>&1
exit $?
Updated by Julien REVERT over 8 years ago
Is it still plan to have "states killing" on gateway failback?
I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.
The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.
How to fix this issue before having an option like "flush states on gateway back"?
Thanks.
Updated by James M over 8 years ago
Julien REVERT wrote:
Is it still plan to have "states killing" on gateway failback?
I have the issue that UDP connections of ip phones or OpenVPN clients remain on the backup wan when master wan is back.
The main issue is at pfsense startup because if master wan is up after backup wan, all iphones and OpenVPN client are registered on the backup wan and keep this config until I do a manual flush states.
How to fix this issue before having an option like "flush states on gateway back"?
Thanks.
I agree with Julien, something like this is needed for state failback after a connection is down.
Updated by → luckman212 over 8 years ago
This would be especially useful for VOIP, where there are often frequent registrations or other SIP traffic that keeps the states locked to the failover WAN even after the primary has come back online. This results in excess usage charges and also poor quality calls where e.g. the failover line is a 4G metered connection. So I would love to see this as well.
I just noticed that this feature request is 6 years old. :/
Updated by Travis McMurry over 7 years ago
As echoed by others, I'm seeing the same thing for VOIP and other devices which auto negotiate VPN tunnels which maintain constant connectivity - Femtocells/Microcells, Meraki branded equipment...
It's also a cost concern as the failover options I use tend to be OOB/4G/LTE, if devices in a failover situation stay connected to a metered connection, that does incur extra cost for unnecessarily consumed bandwidth.
As of 8/3/2017, it's now a 7 year old feature request. Nudge. :-)
Updated by Andrew Bucklin almost 6 years ago
+1 I'm surprised this isn't already a feature. I noticed this today when we our primary connection came back online, but our off-site data backups (which traverse a OpenVPN client connection) were still going over the secondary WAN link, which is 500x slower than the primary WAN. Thank you!
Updated by Marc H over 4 years ago
+1 - this is a badly needed feature with multi WAN where one connection is truly a "backup only" connection. High cost metered LTE, etc... We need an option to force states to fail back to the primary WAN when it is available. Thanks.
Updated by Raffi T over 4 years ago
+1 I haven't really been hurt by this until recently while performing a big backup job to the cloud. Failover occurred briefly but there was still a significant amount of data usage on the metered 4G backup connection well after the event. I had to disable the gateway monitoring action while performing this backup. It says this was requested 10 years ago? Ouch, not enough people requesting it?
Updated by Anonymous about 4 years ago
- Target version changed from 2.5.0 to CE-Next
Updated by aptalca aptalca over 3 years ago
I just hit this issue with a failover LTE connection (metered).
I have almost everything go out over a wireguard tunnel on 2.5.0.
After a main WAN connection loss, everything successfully switched over to the failover LTE gateway. However, after the main WAN came back online and once again became the default gateway, the wireguard tunnel remained going over the backup LTE gateway indefinitely (until I manually intervened).
Hopefully 10 years is a charm? :-)
Thanks
Updated by Viktor Gurov almost 3 years ago
- Related to Feature #12807: Clear Active Secondary WAN Connections added
Updated by Jim Pingle over 2 years ago
- Related to Feature #12092: Utilize new ``pfctl`` abilities to kill states added
Updated by Jim Pingle over 2 years ago
- Subject changed from More flexible options for state killing based on WAN status to Ability to selectively kill states on lower tier gateways when higher tier recovers
- Assignee deleted (
Renato Botelho) - Plus Target Version set to Plus-Next
Updating subject. Many scenarios are now possible with #12092 and also some more will be covered by #12931 so this can be reduced in scope to the single scenario of killing states for lower tier gateways.
Thanks to recent changes in pfctl
this is closer to reality. There is now an ability to kill by the gateway information in a state (pfctl -k gateway -k x.x.x.x
). This can be leveraged here to make it much easier to clear these states.
Now there are only a few items to take care of:
- Needs a new option to selectively activate this feature somewhere, either globally or on a gateway group.
- This only makes logical sense to activate on a single gateway group. There isn't a way to only kill states from a gateway address while also restricting that to states created by a specific gateway group.
- If the option is global, it should include a way to select the failover group for which it applies.
- If the option is on a gateway group, it should only be possible to activate it on a single group, similar to changing the default gateway. Either deactivate the other or throw an input error saying it can't be enabled on more than one per address family.
- Must note that it only works for states matching policy routing rules, as it will not work for traffic matching rules that rely on default gateway switching directly. Those states have 0.0.0.0/:: in their data.
- The actual action will have to be careful to ONLY activate when a gateway recovers from down to up state, not on every filter reload or page load that notices a gateway is up. The gateway alarm script may be the optimal place to handle this, but needs testing to ensure it doesn't activate too often.
- Similar to #12092 it might be nice to have a new per-gateway option to override this behavior
Updated by Jim Pingle over 2 years ago
- Subject changed from Ability to selectively kill states on lower tier gateways when higher tier recovers to Ability to selectively kill states on gateways recovery
Updated by MICHAEL MAST about 2 years ago
Wanted to put more support for this feature. I have 11 netgate appliances deployed and enterprise support on a few, with more on the way with business expansion, and we have a problem with Mitel phones and one way audio during gateway recovery.
On the old Ubiquiti Edgerouters we used the option to clear the state tables during any event, failover and recovery was included. Cisco routers have never given us a problem.
Thank you for working on this feature, I look forward to seeing it deployed.
Updated by Chris B over 1 year ago
6 month addition!
As above, using LTE/5G for backup means the data is very expensive. A brief failover will mean things like Netflix and my work PC's VPN will fail over and never fail back, potentially being very costly.
Updated by Alex Viper_Rus over 1 year ago
A very necessary feature for those who use the second WAN exclusively as a backup channel, and especially if it has very expensive traffic.
It is very disappointing that this function has a low priority and has not yet been implemented.
Updated by Jim Pingle over 1 year ago
- Has duplicate Feature #14533: Kil UDP states on gateway recovery added
Updated by Craig Sharp 11 months ago
This is a very frustrating issue. I do not understand where the issue is at since on a failure, the states are down and the secondary wan picks up. Why is it so difficult to do the same on failback? We should not have to run a command line job to kill the states.
Commercial firewalls do this all the time. 13 years to fix an issue is not promising. Please advise on the status of the fix.
Thanks.
Updated by Henniee Walterson 10 months ago
+1!
Same problem with multi-path routing and multi-wan!
Seems to be easy to solve with pfctl.
Pls do it. Soon.
Updated by Henniee Walterson 10 months ago
- File gateway_recover.png gateway_recover.png added
would be a charm like this...
love my paint :-)
Updated by Henniee Walterson 10 months ago
it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)
Updated by Alex Viper_Rus 10 months ago
it would be useful if vpn connections were also reconnected via the restored gateway
Updated by Alex Viper_Rus 10 months ago
Henniee Walterson wrote in #note-26:
it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)
As far as I understand, this should be implemented not in the settings of the gateways themselves, but in groups of gateways. After all, pfsense must understand which gateways to kill states.
Updated by Marcos M 10 months ago
- Status changed from Pull Request Review to Feedback
- % Done changed from 50 to 100
Applied in changeset 30d46b63834444e9a7a4af310a5d8aaf94baf01a.
Updated by Marcos M 10 months ago
- Related to Todo #15220: Handle ``route-to`` and ``reply-to`` states when using the ``if-bound`` state policy added
Updated by Marcos M 9 months ago
- Status changed from Needs Patch to Resolved
This has been working well in 24.03 snapshots. Documentation is available at:
https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#state-killing-on-gateway-recovery
Updated by Asher Oto 4 months ago
Henniee Walterson wrote in #note-26:
it might be useful to implement the recover state killing in the gateway section too.
(@ "State Killing on Gateway Failure" to "State Killing on Gateway recover" in system/ routing/ gateways/ edit)
This is an excellent idea and pfSense definitely needs it.
If you're using a cellular connection for failover and pfSense fails to clear the states upon recovery, you may incur significant charges for unnecessary traffic that should have switched back to the main gateway, or your account could be throttled.
Someone on Reddit posted about this very thing...
https://www.reddit.com/r/PFSENSE/comments/i3w636/unreliable_wan_failover_switching_notifications/