Project

General

Profile

Bug #11570

Gateway group doesn't failback from tier 2 to tier 1, worked properly in 2.4

Added by M L about 2 months ago. Updated 4 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Gateways
Target version:
-
Start date:
02/27/2021
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.5.x
Affected Architecture:
All
Release Notes:
Default

Description

Good evening. This seems to be a new bug in 2.5, and was not a problem in 2.4. In gateway group configured for main/failover (tier 1 and tier 2), the switch from main to failover worked perfectly. But when the main is restored, it fails to even notice and doesn't failback. This has been reported by numerous users in the subreddit. My post on reddit: https://www.reddit.com/r/PFSENSE/comments/lnuolf/failover_back_to_main_wan_not_switching_without/

This is actually a very expensive and troubling bug. Many people use an LTE modem with metered data, paying by the MB or GB for data. This bug keeps racking up dollars until you go in to manually change it back.

Main to failover switching:
  1. Unplug WAN1
  2. WAN1 interface status shows link down. Check.
  3. Gateway monitor detects loss and marks as offline. Check.
  4. Default gateway changes to WAN2. Check.
  5. Traffic begins flowing properly on WAN2 (only 30 seconds downtime). Check.
  6. Dynamic DNS clients (5) all get updated. Check.
  7. OpenVPN clients (3) all go down and come back up on WAN2. Check.
  8. All systems normal, no meltdowns, smoke contained in devices.
Failover back to main, not so great:
  1. Plug in WAN1
  2. WAN1 interface status shows link up with the IP. Check.
  3. Gateway monitor shows pending/unknown.
  4. The end. Default gateway fails to switch back to main, and obviously nothing else after that happens either.
I can go into System > Routing > Click Save/Apply (no changes), and that seems to kick the gateway monitor. The default gateway switches back to main.
  1. Traffic begins flowing on the main virtually uninterrupted. Check.
  2. Dynamic DNS clients all update back to the main. Check.
  3. OpenVPN clients fail to change back to the main. The OpenVPN clients all remain on WAN2. I have to restart the OpenVPN service for each client, and then they come back up on the main.
  4. All systems back to normal. Yay.

I understand the OpenVPN not cycling back may be an existing issue for many years that people solve with a cron job. But the rest of this problem is new with 2.5.

History

#1 Updated by M L about 2 months ago

I forgot to mention... this does problem only seems to occur when you fail the main by way of unplugging the WAN interface, or powering off the modem, where the link goes down. If you fail the main by for example unplugging the coax to the cable modem, or the ISP goes down, something other than the actual link going down, everything works fine in both directions.

#2 Updated by Viktor Gurov about 1 month ago

related to #10716 and #11298 (?)

#3 Updated by Viktor Gurov about 1 month ago

M L wrote:

Failover back to main, not so great:
  1. Plug in WAN1
  2. WAN1 interface status shows link up with the IP. Check.
  3. Gateway monitor shows pending/unknown.
  4. The end. Default gateway fails to switch back to main, and obviously nothing else after that happens either.

Unable to reproduce this part - after a while the Gateway monitor shows "Online" and successfully restarts the filter/ovpn/ipsec on WAN1.

Maybe there is some kind of race condition

#4 Updated by James Blanton about 1 month ago

Viktor Gurov wrote:

M L wrote:

Failover back to main, not so great:
  1. Plug in WAN1
  2. WAN1 interface status shows link up with the IP. Check.
  3. Gateway monitor shows pending/unknown.
  4. The end. Default gateway fails to switch back to main, and obviously nothing else after that happens either.

Unable to reproduce this part - after a while the Gateway monitor shows "Online" and successfully restarts the filter/ovpn/ipsec on WAN1.

Maybe there is some kind of race condition

This sounds similar to my issue on Bug #11630.

#5 Updated by Fred Latke 4 days ago

I can reproduce exactly the same behavior. If I loose connectivity to the ISP or disconect the coaxil cable from my modem, the main WAN gateway gets placed as default just fine after the outage. If I disconnect the UTP cable or turn off the router, after everythings back up the interface status will show as up, but the gateways widget will show the interface as "offline, packet loss".

Going into System > Routing and clicking save/apply without any changes fixes everything.

Also available in: Atom PDF