Project

General

Profile

Actions

Bug #3179

closed

Gateway failure not properly detected in certain cases using a monitor IP outside of the WAN's subnet

Added by Jim Pingle about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Ermal Luçi
Category:
Gateways
Target version:
Start date:
09/03/2013
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.1
Affected Architecture:

Description

Still researching this a bit but it needs an entry so things don't get lost.

Currently, I have two WANs, DSL and CABLE. DSL is the default. The CABLE WAN is having issues and at times experiencing 80-90% loss. The monitor IP on CABLE is 8.8.8.8. Apinger, however, is not reporting loss during these times. It is showing the gateway as online, even though the circuit is experiencing massive loss confirmed by other methods.

Because of the loss and lack of detected failure, manually reconfiguring the gateway groups is necessary to regain usable connectivity from behind the firewall.

This may possibly be due to the removal of static routes for apinger targets, but testing is needed to confirm.


Files

oddicmp.cap (7.28 KB) oddicmp.cap Jim Pingle, 09/03/2013 11:00 AM
Actions #1

Updated by Shahid Sheikh about 8 years ago

I can provide some input on this issue as well.

On 2 of 8 of my firewalls I have this problem happen consistently. On remaining 6 problem usually shows up after default gateway fails over once.

I have WAN and OPT1 interfaces. Default GW is on WAN. The monitor IPs are not on the same respective subnets. Doing a packet capture on the OPT1 interface does not show any of the ICMP packets. On the WAN interface I see ICMP packets to both monitor IPs of WAN's GW and OPT1's GW. The source IP for the ICMP destined to monitor IP of OPT1's GW is the IP address of the OPT1 interface. But the packet itself is being sent out by the WAN interface.

My workaround right now is to add static routes for the monitor IPs.

Another observation is the unexpected behavior when a DNS server set to be queried through one GW is also being used as a monitor IP for another GW. Setting it as a DNS with a specific gateway enters a static route.

Actions #2

Updated by Jim Pingle about 8 years ago

Attaching a capture file that shows the ICMP actually is going out the right interface and is experiencing loss. But at the time apinger reports 0.0% loss on that WAN.

So the static routes do help certain scenarios, but not all.

Actions #3

Updated by Ermal Luçi about 8 years ago

  • Status changed from New to Feedback
Actions #4

Updated by Jim Pingle about 8 years ago

It now appears as though apinger sees the gateway as down but does not report nor graph the result as expected.

If you change the 'down' time to a value longer than the number of samples required for calculation (e.g. 30) the graph is correct.

So the problem appears mostly if the down time is at the default value of 10 (or less) since it uses 10 samples for calculation.

Actions #5

Updated by Chris Buechler about 8 years ago

  • Status changed from Feedback to Resolved

this particular issue is fixed, the issue with 10 vs. 30 seconds with packet loss still exists but isn't a regression. I'll open a separate ticket on that.

Actions

Also available in: Atom PDF