Project

General

Profile

Actions

Bug #15303

open

dpinger service does not always switch from Pending to Online

Added by Kris Phillips about 2 months ago. Updated 6 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Gateway Monitoring
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Release Notes:
Default
Affected Plus Version:
23.09
Affected Architecture:
All

Description

There are several situations where dpinger will not detect a gateway that is available when it should, forcing a restart of the dpinger service to "trigger" it to recheck.

Known situations, but there may be more:

1. Adding a new VTI tunnel as an interface
2. A release/renew of an IPv6 gateway (IPv4 gateway will show up, but IPv6 will not until a dpinger restart)
3. Adding an OpenVPN client/server as an interface

Related documentation redmine: https://redmine.pfsense.org/issues/15230


Files

VTIInterface.png (41.2 KB) VTIInterface.png Kris Phillips, 03/24/2024 02:18 AM
GatewayStatusPending.png (72.4 KB) GatewayStatusPending.png Kris Phillips, 03/24/2024 02:18 AM
Actions #1

Updated by Hal Prewitt about 2 months ago

I have seen cases where restarting dpinger fails to clear the Pending status and where it should have worked. Dpinger has difficulty distinguishing between having a connection while waiting for the IP from DHCP and there is none because the cable is disconnected, failed or the modem is powered off. I believe dpinger has one or more bugs.

Actions #2

Updated by Kris Phillips about 2 months ago

Hal Prewitt wrote in #note-1:

I have seen cases where restarting dpinger fails to clear the Pending status and where it should have worked. Dpinger has difficulty distinguishing between having a connection while waiting for the IP from DHCP and there is none because the cable is disconnected, failed or the modem is powered off. I believe dpinger has one or more bugs.

Hello Hal,

With a DHCP WAN that has it's cable unplugged, it's expected behavior to show "Pending" on the gateway. This is because there is no IP defined for the gateway due to lack of DHCP and physical link. Unless you are referring to a situation where your WAN has link and a valid IP address, subnet, and gateway, but the gateway still shows Pending, this is expected behavior.

Actions #3

Updated by Hal Prewitt about 2 months ago

Using Pending when there is no physical link is confusing and I would say is an incorrect reporting of the actual status vs the condition where there is a link but waiting for the IP from a DHCP server. Obviously, these are very different reasons for not having a working WAN. Offline means to me (and most people), there is no connection to even expect an IP. Makes us look for wiring or power problems causing the failure. I and one of your techs have seen Pending stuck when there is a physical link and IP.

Actions #4

Updated by Kris Phillips about 2 months ago

Hal Prewitt wrote in #note-3:

Using Pending when there is no physical link is confusing and I would say is an incorrect reporting of the actual status vs the condition where there is a link but waiting for the IP from a DHCP server. Obviously, these are very different reasons for not having a working WAN. Offline means to me (and most people), there is no connection to even expect an IP. Makes us look for wiring or power problems causing the failure. I and one of your techs have seen Pending stuck when there is a physical link and IP.

Hello Hal,

If you can please provide an example of this attached to the redmine, that would be most helpful. However, I'm not able to reproduce your issue.

As for the Offline vs Pending, this is expected behavior. If you'd like to open a redmine for changing the wording to be more clear, please feel free to do so.

Actions #5

Updated by Hal Prewitt about 2 months ago

Yes, sometimes reconnecting the port will clear Pending but not always.  Appears there are many ways to get into Pending and be stuck.

This is a serious bug. Today, I spent many hours trying every config change I could think of (reboots, config rollback, disabling, enabling, dhcp release/refresh, unplugging/replugging cables ect) to clear the Pending status. All failed except one process. Changing the Interface to use a static IP & its gateway to the last known DHPC values and rebooting the WAN modem. Once online, switching back to a DHCP Interface worked.

As for being the expected behavior of displaying Offline vs Pending, this is not an issue of changing the wording to be more clear. I don't see a good reason to treat gateways differently whether or not they use DHCP or a Static IP.  

Why be different than what's shown under the Interface Status tab? That display will typically show UP (working), Down (A dynamic DHCP WAN type is not fully connected or does not have an IP),  No Carrier (cable is not plugged in or the device on the other end is malfunctioning in some way.).  

WANs/Gateways are either Online, Offline, or Warning (partial working but failing, % of loss in other words seeing errors).  Perhaps there are, except I am unaware of a continuous state or condition where Pending is correct.

This Pending Status is a design error and needs to be removed.  Has many logic and operational errors. There is no timeout meaning this status may stay forever.

There are no gateway log entries to indicate the actual status of the gateway.  No notices are sent when the gateway becomes Pending. Under the tab and when using Diagnostics/Ping or Traceroute and selecting the Pending Gateway in Source Address, will result in what appears to be a working connection. Regardless of the selection, traffic is actually using a different Interface.(assuming another one is available).

Each of these result in no user awareness of the failure.

Actions #6

Updated by Danilo Zrenjanin about 2 months ago

Hal Prewitt wrote in #note-5:

Yes, sometimes reconnecting the port will clear Pending but not always.  Appears there are many ways to get into Pending and be stuck.

This is a serious bug. Today, I spent many hours trying every config change I could think of (reboots, config rollback, disabling, enabling, dhcp release/refresh, unplugging/replugging cables ect) to clear the Pending status. All failed except one process. Changing the Interface to use a static IP & its gateway to the last known DHPC values and rebooting the WAN modem. Once online, switching back to a DHCP Interface worked.

After conducting several tests with a Netgate 6100 directly connected to a modem.
  1. I disconnected and connected the cable from the Netgate 6100's WAN interface 20 times. Following each reconnection, the status transitioned from Pending to Online without any failures.
  2. I performed 10 modem reboots. Each time, the gateway successfully transitioned from "Pending" to "Packet Loss" to "Online."

In my lab environment, I was unable to reproduce the scenario where the gateway status becomes stuck in a Pending state.

There are no gateway log entries to indicate the actual status of the gateway. No notices are sent when the gateway becomes Pending. Under the tab and when using Diagnostics/Ping or Traceroute and selecting the Pending Gateway in Source Address, will result in what appears to be a working connection. Regardless of the selection, traffic is actually using a different Interface.(assuming another one is available).

Yes, I can confirm that the Status/System Logs/System/Gateways does not contain any logs other than dpinger restart, indicating that the gateway was down while in a Pending status. Which should be improved.

Actions #7

Updated by Chris Linstruth about 2 months ago

I had the same experience. Down the link and it goes Pending. Up the link and it transitions to Online after enough replies are received.

"Pending" is a valid gateway status if the pings cannot be sent. Reasons a ping cannot be sent would be interface down, lack of ARP for the gateway address. It is not "Down" because pings cannot be sent so packet loss and latency cannot be measured. It is "Pending" until pings can be sent. This is similar to CARP INIT state.

Actions #8

Updated by Chris W about 2 months ago

  • Affected Plus Version changed from 23.09.1 to 23.09

Mr. Prewitt and I had a phone session today and spent a significant amount of time gathering information about the system and traffic when WAN2 was in the Pending state, what methods of trying to manipulate the interface were and were not successful, and what the logs and traffic looked like when we did get it back online. Since there's a lot of information in these files which could be privacy-sensitive, the packet captures, status archives, screenshots, and a full synopsis of the session are in a .zip file in ticket 2419582848.

Actions #10

Updated by Hal Prewitt about 2 months ago

Reply to what Danilo Zrenjanin wrote in #note-6:

(1) Nothing surprising your 20 times worked. I too, saw many successes doing these simple tests. Except:
(2) There are many permutations and combinations in computer applications that each result in different conditions. There is a real bug in dpinger's Pending implementation or maybe in the DHCP client processing. It's just hard to find the right conditions so the bug occurs and is repeatable. Perhaps, Chris W's testing described in #note-8 and resulting just issued patch for "BPF device not available for the DHCP client", will fix one of the problems of being stuck in Pending.

Actions #11

Updated by Hal Prewitt about 2 months ago

Reply to what Chris Linstruth wrote in #note-7:

I am aware of the transitions due to replies, except that was not my issue. Showing "Pending" is not the correct status when a port is disconnected such as when there is no ethernet cable attached or the modem is powered off. Status should show down or Offline because nothing is pending. The port may never reconnect. As I wrote above, there are many operational or design problems with the current implementation of "Pending". Furthermore, I don't see its usefulness. DHCP & Static IP configurations should be treated the same (Status) when disconnected.

Actions #13

Updated by Jordan G about 2 months ago

Above patch didn't seem to make a difference with respect to the VTI tunnel that sticks in pending when monitoring is enabled on my setup. Also when a gateway is in pending status the option to force state, mark gateway as down, does nothing to effect the status (pending) when option is used for said gateway, it remains pending.

Actions #14

Updated by Kris Phillips about 1 month ago

Just tested this on 24.03. Added a new VTI, added the interface, and checked the Status --> Gateways page. Gateway shows Pending until you restart the dpinger service, but before you do the interface is properly configured with an IP, subnet, and gateway. Pictures attached.

Actions #15

Updated by Kris Phillips 27 days ago

Tested this on the 24.03 BETA and this issue is present on that version as well.

Actions #16

Updated by Kris Phillips 13 days ago

Tested on 24.03-RC and this issue is still present.

Actions #17

Updated by Kris Phillips 6 days ago

Tested on 24.03-RELEASE and this issue is still present.

Actions

Also available in: Atom PDF