Bug #10546

Gateways removed from routing groups based on low alert thresholds

Added by Vladimir Voskoboynikov 11 months ago. Updated 5 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:
Affected Version:
Affected Architecture:
Release Notes:


In a Multi-WAN failover scenario, individual gateways are added and removed from gateway groups based on dpinger alarms, which trigger when the 'high' latency or packet loss thresholds are crossed. Gateways are added/removed from gateway groups in get_gwgroup_members_inner(), and the gateway status is reported from return_gateways_status() without a detailed status. Thus, the gateway status can be one of "down" (high latency or loss threshold exceeded), "loss" (low loss threshold exceeded), "delay" (low delay threshold exceeded), or "none" (below all thresholds).

get_gwgroup_members_inner() will also remove gateways for the "loss" and "delay" states, which is unexpected. This leaves the following potential scenario:

  1. A gateway exceeds the high latency (or loss) threshold. A dpinger alarm is raised, and the gateway is removed from the gateway group.
  2. The gateway returns to a latency between the low and high thresholds. A dpinger alarm is raised, but the gateway is not added back to the gateway group as it is still in a "loss" status.
  3. The gateway returns below the low loss threshold, and remains that way. No dpinger alarm is raised, as the high threshold was not crossed, and no code is ever called to reconsider the gateway groups. The gateway remains removed from the gateway group indefinitely.

In this case, pfsense will consider a gateway down when it has actually returned to a normal state, necessitating administrator action to return it back to a proper state.

Associated revisions

Revision 04a72a97 (diff)
Added by Vladimir Voskoboynikov 6 months ago

Add gateway substatus. Fixes #10546

Update return_gateways_status to return a substatus as well as the existing status.

status changed to be one of online or down.
substatus can be one of none, down, highloss, highlatency, loss, latency, or force_down

Edit status pages, gateway widget, and gateway group code accordingly.

Revision 094db492 (diff)
Added by Vladimir Voskoboynikov 6 months ago

Minor text fix. Issue #10546

No need to log the PID, it's added to the logs anyways.


#1 Updated by Jim Pingle 11 months ago

  • Target version set to 2.5.0

#2 Updated by Jim Pingle 11 months ago

  • Status changed from New to Pull Request Review

#3 Updated by Renato Botelho 7 months ago

  • Assignee set to Renato Botelho

#4 Updated by Jörn Greszki 7 months ago

Dear gents

is the behavior I describe

related to your findings?

#5 Updated by Renato Botelho 6 months ago

  • Status changed from Pull Request Review to Feedback

PR has been merged. Thanks!

#6 Updated by Vladimir Voskoboynikov 6 months ago

  • % Done changed from 0 to 100

#7 Updated by Steve Beaver 6 months ago

  • Status changed from Feedback to Resolved

#8 Updated by Jörn Greszki 5 months ago

Now tested with 2.5.0.a.20201101.1850

I still get for unknown reasons sometimes partial or full loss for alive-ping at one of the 2 WAN interfaces, but this is not the issue.

Nov 2 10:37:56 dpinger 16236 WAN_PHY1_IGB0GW Alarm latency 0us stddev 0us loss 100%

Problem is that this status remains until any change to the gateway group is made - then it works immediately.

dpinger is not reattempting to reach the defined IP or the process maintaining the operational status is not taking over the changes.

Also available in: Atom PDF