Project

General

Custom queries

Profile

Actions

Bug #16039

open

Gateway does not go down when packet loss threshold is set to 100%

Added by Andrew Collings 3 months ago. Updated 2 months ago.

Status:
Confirmed
Priority:
High
Assignee:
-
Category:
Gateway Monitoring
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Release Notes:
Default
Affected Plus Version:
24.11
Affected Architecture:
6100

Description

I have multiple locations with Netgate 6100 appliances that have wired broadband (tier 1) with a cellular backup (tier 2). I don't want pfSense to fail over to cellular unless the wired broadband is completely down so I set the high packet loss threshold on the wired gateway (tier 1) to 100. This causes the gateway down action to never trigger even if I unplug the wired connection guaranteeing 100% loss. It just shows the gateway in warning status with 100% loss. If I set the high packet loss threshold to 99 the gateway will go down as expected. The monitoring time period is the default 60 seconds and I've tried leaving it for the better part of an hour but it never triggers. I've been able to replicate this issue on 6 different pfSense installs (all running 24.11). The workaround is simple and there isn't a meaningful difference between 99% loss and 100% loss but I'm marking it as high priority because it can cause complete loss of connectivity for anyone who isn't aware of this behavior.


Files

Actions #1

Updated by Chris W 3 months ago

  • Status changed from New to Incomplete

I can reproduce this for the most part. I set WAN1 for the Tier1 interface and WAN2 for the Tier2 on a 4100 (identical ports to the 6100). The the gateway group's Trigger Level is set to Packet Loss and the Tier1 GW's high packet loss threshold at 100%.

I can physically unplug the wire from WAN1 and:
- Status > Gateways then immediately shows Pending when I refresh the page.
- Status > Interfaces shows No Carrier because it's disconnected.
- System > Routing > Gateways shows it's failed over to the Tier2 gateway.
- Diagnostics > Routes shows the default route is now out WAN2.

If I log into the switch on the other side the wire from WAN1 and disable the VLAN on the switch port, WAN1 has no working upstream route but does still have a physical link:
- Status > Gateways will take a minute or so to reach "Warning, Packetloss: 100%" and then Pending, though showing Pending was inconsistent.
- Status > Interfaces shows the interface is up (expected since it's still plugged in)
- System > Routing shows it's NOT failed over and the Tier1 gateway is still the system's default gateway.
- Diagnostics > Routes shows the default route is still out WAN1.
- If I then unplug WAN1 and plug it back in, it fails over to Tier2 as default gateway and the default route changes to via WAN2. Those changes persist until I restore the VLAN back to the switch port. Within about 10 seconds, the firewall then fails back to the Tier1 gateway.

If I set the high loss threshold to 99% and unplug WAN1, I get the same result described above. No change. The second exercise however quickly fails over to Tier2, the default route is out WAN2, and Tier1 moves to "Offline, Packetloss: x%" which eventually climbs to 100%.

I see the reason for opening the report but as a side note, if you want failover when a member is detected completely down there's a Member Down trigger level in the gateway group settings for that circumstance. No need to adjust the loss thresholds.

Actions #3

Updated by Chris W 3 months ago

  • Status changed from Incomplete to Confirmed
Actions #4

Updated by Andrew Collings 2 months ago

Thanks for looking into it and for the heads up. I can't believe I completely missed that option.

Actions

Also available in: Atom PDF