Bug #3179: Gateway failure not properly detected in certain cases using a monitor IP outside of the WAN's subnet - pfSense - pfSense bugtracker

Actions

Copy link

Bug #3179

closed

Gateway failure not properly detected in certain cases using a monitor IP outside of the WAN's subnet

Added by Jim Pingle over 11 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

High

Assignee:

Ermal Luçi

Category:

Gateways

Target version:

2.1

Start date:

09/03/2013

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Affected Version:

2.1

Affected Architecture:

Description

Still researching this a bit but it needs an entry so things don't get lost.

Currently, I have two WANs, DSL and CABLE. DSL is the default. The CABLE WAN is having issues and at times experiencing 80-90% loss. The monitor IP on CABLE is 8.8.8.8. Apinger, however, is not reporting loss during these times. It is showing the gateway as online, even though the circuit is experiencing massive loss confirmed by other methods.

Because of the loss and lack of detected failure, manually reconfiguring the gateway groups is necessary to regain usable connectivity from behind the firewall.

This may possibly be due to the removal of static routes for apinger targets, but testing is needed to confirm.

Files

oddicmp.cap (7.28 KB) oddicmp.cap

Jim Pingle, 09/03/2013 11:00 AM

Actions

Copy link

Updated by Shahid Sheikh over 11 years ago

I can provide some input on this issue as well.

On 2 of 8 of my firewalls I have this problem happen consistently. On remaining 6 problem usually shows up after default gateway fails over once.

I have WAN and OPT1 interfaces. Default GW is on WAN. The monitor IPs are not on the same respective subnets. Doing a packet capture on the OPT1 interface does not show any of the ICMP packets. On the WAN interface I see ICMP packets to both monitor IPs of WAN's GW and OPT1's GW. The source IP for the ICMP destined to monitor IP of OPT1's GW is the IP address of the OPT1 interface. But the packet itself is being sent out by the WAN interface.

My workaround right now is to add static routes for the monitor IPs.

Another observation is the unexpected behavior when a DNS server set to be queried through one GW is also being used as a monitor IP for another GW. Setting it as a DNS with a specific gateway enters a static route.

Actions

Copy link

Updated by Jim Pingle over 11 years ago

File oddicmp.cap oddicmp.cap added

Attaching a capture file that shows the ICMP actually is going out the right interface and is experiencing loss. But at the time apinger reports 0.0% loss on that WAN.

So the static routes do help certain scenarios, but not all.

Actions

Copy link

Updated by Ermal Luçi over 11 years ago

Status changed from New to Feedback

Actions

Copy link

Updated by Jim Pingle over 11 years ago

It now appears as though apinger sees the gateway as down but does not report nor graph the result as expected.

If you change the 'down' time to a value longer than the number of samples required for calculation (e.g. 30) the graph is correct.

So the problem appears mostly if the down time is at the default value of 10 (or less) since it uses 10 samples for calculation.

Actions

Copy link

Updated by Chris Buechler over 11 years ago

Status changed from Feedback to Resolved

this particular issue is fixed, the issue with 10 vs. 30 seconds with packet loss still exists but isn't a regression. I'll open a separate ticket on that.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Bug #3179

Gateway failure not properly detected in certain cases using a monitor IP outside of the WAN's subnet

Updated by Shahid Sheikh over 11 years ago

Updated by Jim Pingle over 11 years ago

Updated by Ermal Luçi over 11 years ago

Updated by Jim Pingle over 11 years ago

Updated by Chris Buechler over 11 years ago