Bug #10546
closedGateways removed from routing groups based on low alert thresholds
100%
Description
In a Multi-WAN failover scenario, individual gateways are added and removed from gateway groups based on dpinger alarms, which trigger when the 'high' latency or packet loss thresholds are crossed. Gateways are added/removed from gateway groups in get_gwgroup_members_inner(), and the gateway status is reported from return_gateways_status() without a detailed status. Thus, the gateway status can be one of "down" (high latency or loss threshold exceeded), "loss" (low loss threshold exceeded), "delay" (low delay threshold exceeded), or "none" (below all thresholds).
get_gwgroup_members_inner() will also remove gateways for the "loss" and "delay" states, which is unexpected. This leaves the following potential scenario:
- A gateway exceeds the high latency (or loss) threshold. A dpinger alarm is raised, and the gateway is removed from the gateway group.
- The gateway returns to a latency between the low and high thresholds. A dpinger alarm is raised, but the gateway is not added back to the gateway group as it is still in a "loss" status.
- The gateway returns below the low loss threshold, and remains that way. No dpinger alarm is raised, as the high threshold was not crossed, and no code is ever called to reconsider the gateway groups. The gateway remains removed from the gateway group indefinitely.
In this case, pfsense will consider a gateway down when it has actually returned to a normal state, necessitating administrator action to return it back to a proper state.