Bug #1705
closedMulti-WAN Failover loses default route
0%
Description
Сonfigured Multi-WAN (Gateway groups: tier 1, tier 2, member down) and turned on "Allow default gateway switching" (System -> Advanced -> Miscellaneous). Sometimes the DEFAULT route is lost, although both interfaces work. As I understood, when for a short time decreases the main interface (tier 1), because it comes on notice mailbox (LAN): "MONITOR: WAN is down, removing from routing group"
P.S. And on a regular mailbox to be reported: "Gateways status could not be determined, considering all as up / active."
Updated by Chris Buechler almost 14 years ago
- Priority changed from High to Normal
- Target version deleted (
2.0)
Updated by Mike Brady almost 14 years ago
I have also observed this with 2.0RC3.
In my case the tier 1 interface is pppoe. When it is taken down the default route is changed to the tier 2 interface which is static. This is indicated both in the logs and with netstat output. A short time later the default route disappears altogether. There is nothing that I can find in the logs to indicate why or even that the route is being removed. I have not been able to track how long "a short time later" is or whether the time varies.
This function seems to be required to get SQUID to work in a WAN failover situation as I could not get SQUID to work if there is no default route.
Updated by Mike Brady almost 14 years ago
I should also say that the pppoe link is being taken down by unplugging the cable. When the cable is reconnected the pppoe connection comes backup and the original default route is readded. This route does not disappear, so in my case this maybe related to the clean up after the pppoe link (which is the WAN interface) goes down.
Updated by Chris Buechler almost 14 years ago
Mike - what you're describing is the correct default behavior. This ticket is about a non-default option that we do not officially support for 2.0 (hence the target version gone) but it works in many people's circumstances so it's there as an advanced option. You don't need a default route for Squid, you can post to the forum or list for info.
Updated by Mike Brady almost 14 years ago
Chris - Thanks for letting me know. I have been looking through the forums, but nothing there works for me for some reason. I will implement without a failover until I have time to go through the underlying pf rules and figure out what it is that I am doing wrong.
Updated by Harry Coin over 13 years ago
Bump in a new mode:
"Gateways status could not be determined, considering all as up/active."
Sent at least a few times every several hours, in a simple setup with two hardware nics connected each to two ISP's using latest stable release pfsense i386. There are the two native pfsense gateways defined by the isp and given at interface setup, then a failover gateway. The option to have which is the default gateway switch should the 'primary default' go down is set.
It is little more than a hunch, but I think it happens only after one of the gateways ping response lags too far for the first time after bootup. My second best hunch comes from noticing the default rule logs are filled with fractured tcp operations where packets tried and failed to pass through other than the wan interface they started on. I finally had to give up logging default rule stuff because it just jammed up the logs with entries about which I could do nothing. Tried 'conservative' policies, a few other attempted sticky tweaks and so on-- no joy.
Anyhow there are no overt problems user report, but I think after the first of one or the other of the above events the messages come, but they don't come at anything like the rate the system is asked to ping something on the other side of the gateway to insure the gateways are working. Much longer between messages, anywhere from half an hour to several hours.
Anyhow, no IPSec, no VPN, no PPPoe, just two wans, a pfsync and lan.
Updated by Chris Buechler over 9 years ago
- Status changed from New to Closed
- Affected Version deleted (
2.0)
root cause of this is "Gateways status could not be determined", which was an apinger status race condition. that's fixed in 2.3 by replacing apinger