Gateway Group slow (or never) to switch back to Tier 1
See https://forum.netgate.com/topic/136852/2-4-4-gateway-group-slow-or-never-to-switch-back-to-tier-1. (No responses yet as of this posting.)
I have a gateway group with 2 gateways, one at Tier 1 and the other at Tier 2. I've been having lots of trouble with my Tier 1 link lately and pfSense will switch over to the Tier 2 link, but when the Tier 1 gateway comes back within limits (latency, packet loss) the routing does not switch back to the Tier 1 gateway. The Gateways widget on the home page shows the Tier 1 as "online" as does Status -> Gateways and Status -> Gateway Groups. The log file shows an alarm for latency and then cleared for latency.
I've set that gateway group as the default gateway and am also sending traffic to it with a LAN firewall rule.
#2 Updated by Mitch Claborn almost 2 years ago
To make things even more complicated, in the workaround mentioned above, the routing actually changes back to the Tier 1 gateway when I mark it as down, so that when the status is "forced offline" it is still routing through that gateway. When I undo the "mark as down" it continues to route through that gateway.
#3 Updated by Mitch Claborn almost 2 years ago
The Gateway Group was set as Trigger Level: Packet Loss or High Latency. I changed that to "Member Down" and now the routing seems to be switching back to the Tier 1 gateway as it should. I'm going to revert to "Packet Loss or High Latency" as a test to see if that triggers the problem.
#11 Updated by Bob Guo over 1 year ago
Generally same problem here, BUT EVEN HAVE PROBLEM WHEN THE GATEWAY GROUP ISN'T PFSENSE DEFAULT GATEWAY. After digging a little bit deeper, I found that the problem resides in failing to generate correct config in /tmp/rules.debug. As f ar as I see, there is no clue shows that there is problem with pfctl reading rules; therefore, I thought the problem might be in parts that take charg e of generating rules or calls generating. For example, when alarms on loss or delay disappeared, pfsense didn't call rules generating process.
I currently try only member down as trigger, and I'd like to know if it works for you.
As I noticed, when problems present, a simple filter reload will get everything back to normal. I don't know if it will work for you. If it works, a temporary workaround may be some cron runned script monitoring if gateway is correct.
#17 Updated by Jörn Greszki 27 days ago
I am not sure if my issue:
is related to that what you describe.
If not, I would open a seperate bug ticket, if not, I would contribute information and further testing if needed.
Right now, for me, Multi-WAN is working, but not able to recover from 100% packet loss events.