Project

General

Profile

Actions

Feature #4354

closed

Allow dpinger to ping more than one destination for a gateway.

Added by Raimund Sacherer almost 7 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Gateway Monitoring
Target version:
-
Start date:
01/31/2015
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:

Description

Hello,

I would like to be able to put more than one IP as a monitoring IP in the GUI. I would like the system to use the first IP unless the first stops responding, then tries the second and if the second also does not respond takes the interface offline.

My rationale for this is that we have lot's of remote offices around the world, some in destinations which do not have very reliable service (like morocco). Using the DNS servers of the Internet Provider would be best, but in the case of morocco I had various problems at times with their DNS servers. For normal operation it's not a big deal but if flaky connectivity takes down the line intermittently and the line gets switched over to another I have problems with our SIP connectivity. (We depend very much on SIP over Internet).

I tried to use more reliable services like OpenDNS (did not really work out very well, sometimes high latency, etc.). I tried to use root servers, but this also left me with a churning feeling in my stomach, what if they harden their rules and one day they lock me out for pinging so much?

If I can use 2 DNS servers as reliable ping sources I can at least rest assured that normally if one DNS get's a maintenance window or has other issues, but the second works, than my line will stay up.

Thank you,
best
Ray

Actions #1

Updated by Phillip Davis almost 7 years ago

See feature request https://redmine.pfsense.org/issues/1189
Various people have discussed this over the years - nobody has put time into a solution (including me - I would like this but seem to keep finding existing bugs to track down)

Actions #2

Updated by Jim Thompson almost 7 years ago

we're probably going to re-write apinger for 2.3

Actions #3

Updated by Michael Kellogg almost 7 years ago

I've got 2 poor internet connections and use apinger to switch as need be and as of 2.2rc this is no longer a reliable option with all the problems with apinger causing latency to drop below 1ms or false packet loss and senseless emails and email storms . if your bitten by this bug its most likely your biggest "pain point". I guess this is one of those issues that it depends on what side of the bathroom door your on

Actions #4

Updated by → luckman212 almost 7 years ago

This would be an amazing enhancement! My only comment would be if you're going to enhance the engine to support multiple monitor IPs, then at least allow 3 (or even 5). 2 might still not be enough!

Actions #5

Updated by Michael Kellogg about 6 years ago

this is not showing up to be tracked for 2.3 or future

Actions #6

Updated by Jim Thompson almost 6 years ago

  • Subject changed from Allow apinger to ping more than one destination for a gateway. to Allow dpinger to ping more than one destination for a gateway.
  • Assignee set to Jim Thompson
  • Priority changed from Normal to Low
  • Target version set to Future

using an anycasted IP as the monitor IP should generally suffice

Actions #7

Updated by Jim Thompson almost 6 years ago

  • Status changed from New to Closed
Actions #8

Updated by Chris Buechler over 5 years ago

  • Target version deleted (Future)
Actions #9

Updated by → luckman212 almost 5 years ago

Can we get this one re-opened? This "bit" me badly yesterday at a customer site. Monitor IP of 8.8.8.8 started "flapping" and 40-60% packet loss took down an otherwise perfectly working fiber gateway- which caused an office of 200 people to lose phone service. I opened a ticket with Netgate and it was suggested that I change the GW monitor to "packet loss" instead, which would not really have solved the issue, since the other gateway does not support our VOIP traffic. All that it took to "fix" the issue was a quick stop/start of dpinger and all was well in the world. But this type of manual intervention is exactly why we roll out pfSense multi-wan in the first place so having to hand-hold like this defeats the purpose.

Really what is needed here is more robust GW monitoring, a single point of failure is not enough and can cause serious issues. I wish I had the chops to program this myself. Would a bounty help? Certainly I would contribute to that, and I would imagine quite a lot of people would......

Actions #10

Updated by Jim Pingle almost 5 years ago

No matter how multiple targets are handled, it is worse off in some way (excess traffic, too much time before an outage is detected, etc).

Appeal to the author of dpinger for dpinger features: https://github.com/dennypage/dpinger

If he can come up with a good way to handle multiple targets that we haven't thought of, we can pick the feature up after it's added there.

Actions #11

Updated by Phillip Davis almost 5 years ago

I would like to see something like this also. I had been meaning to look at it a long time ago! Maybe I will play with it next week, but I can't promise. We have plenty of events her in Nepal where some bit of backbone goes AWOL and site that are expected to be always up cannot be reached (like 8.8.8.8 going AWOL) but the internet in general (90%+) is OK. So being able to monitor a couple of IPs, and consider the WAN is up if any of them is good, would be handy.

Actions #12

Updated by → luckman212 almost 5 years ago

"excess traffic" -- a 0 byte payload ICMP? I don't think we can call that excess traffic :)

"too much time before outage" is waaay better than a false outage that requires manual intervention if you ask me....

...Also I don't see why having multiple targets adds any time before outage detection-- if all targets are pinging at the same time and the gateway is truly down, then the time to fail will be exactly the same as having just a single monitor IP. I will post something to denny but I imagine if the request came from you guys instead of a lowly user it would have more leverage :) As I said I am more than happy to donate $

edit: opened an issue on denny's github: https://github.com/dennypage/dpinger/issues/22

Actions #13

Updated by Luiz Souza almost 5 years ago

Phillip Davis wrote:

I would like to see something like this also. I had been meaning to look at it a long time ago! Maybe I will play with it next week, but I can't promise. We have plenty of events her in Nepal where some bit of backbone goes AWOL and site that are expected to be always up cannot be reached (like 8.8.8.8 going AWOL) but the internet in general (90%+) is OK. So being able to monitor a couple of IPs, and consider the WAN is up if any of them is good, would be handy.

I'm not sure this is going to solve all the problems.

If you monitor a couple of IPs and one of them is really down, the one you really need access, how you are going to cope with this ? But then you are monitoring 4 or 5 IPs, all the others are UP so should we consider the WAN connection UP ?

(and then we get back to monitor a single IP, the one that really matters)

8.8.8.8 is not a good target, look for something else, a lot of people have reported that already.

Also, you can always work what is 'link down' in your situation (number of failures, time between the probes, latency, packet loss).

The ability to monitor multiple IPs is cool, but only if used correctly.

Actions #14

Updated by → luckman212 almost 5 years ago

Luiz:

"If you monitor a couple of IPs and one of them is really down, the one you really need access, how you are going to cope with this ?" YOU are not. Because it's not your problem at that point, it's on the far end. So you or your users can continue happily using the 99.999999% of the rest of the internet. If connectivity to a single specific host through a single specific gateway is desired, we can already solve that with gateway groups and Firewall rules to target that specific host or network.

"8.8.8.8 is not a good target" huhhh? Then why does https://doc.pfsense.org/index.php/Multi-WAN#Monitor_IP specifically recommend using Google DNS.... ?

Actions #15

Updated by Michael Kellogg almost 5 years ago

let me add to this talk past experiences (as i have a couple of maybe the worst isps anywhere ) I had a old dual wan router maybe 2005 ish XiNCOM XC-DPG502 then switched names to syswan sw24 had option to monitor in multiple was (thankfully) as isp blocked all icmp (for security reasons) for 5-7 years they used www heartbeat, packet flow, and icmp ping not sure how but maybe should go see if i can find where i tossed it and look some more

Actions #16

Updated by Michael Kellogg almost 5 years ago

this is from the manual
Connection Health Check: Uses the following methods to check if the WAN interfaces are still connected tothe Internet.ICMP: If it is enabled, this device will perform ICMP echo test on the link between the WAN port and thespecified host (Alive Indicator) periodically.If there is at least one success echo out of four tries, this link passes the ICMP test. Otherwise, it fails.HTTP: If it is enabled, this device will build a TCP connection between the WAN port and the AliveIndicator first. Then the device will send a HTTP HEAD packet to the Alive Indicator periodically. If theAlive Indicator replies with an acknowledgment out of 5 tries, the link passes the HTTP test. Otherwise,it fails.Traffic: If it is enabled and if there are packets through the WAN port in the Interval time, the WAN linkis considered as connected. Otherwise, the device refers to an active health check method such asHTTP or ICMP.Interval: The period in seconds to check if the WAN port is responding.Alive Indicator: This field should be filled in with a host name (FQDN) or IP address for the ICMP or HTTPmethods.

Actions #17

Updated by Kill Bill almost 5 years ago

Luke Hamburg wrote:

"8.8.8.8 is not a good target" huhhh? Then why does https://doc.pfsense.org/index.php/Multi-WAN#Monitor_IP specifically recommend using Google DNS.... ?

Yeah, Google DNS indeed is not a good target, they'll start rate-limiting or dropping ICMP if you keep bombing them with inordinate amount of pings. Seen this myself in a couple of places.

Actions #18

Updated by Denny Page almost 5 years ago

Luke opened an issue with dpinger. For reference, I've copied the response here.


Hey Luke,

I understand what you are trying to accomplish, but dpinger is not the place to implement that type of logic. Dpinger is a latency/loss monitor, not a decision maker. The concept of monitoring multiple targets, and deciding what if to do if a value for one or both of them reaches a certain threshold, belongs in a higher level. A good example of this higher level decision making is the already exiting latency/loss thresholds present in pfSense 2.3. Another example, and perhaps more direct to the point, is the action disable option offered in 2.4.

There are other less important issues, such as how do you report the combined results? Do you average them? Anything you do in terms of averaging hides information from the upper levels which in general is not a good idea. For example, knowing that the average latency between the two links jumped from 18ms to 80ms isn't sufficient. You really need to know which link is the source of the problem. There are other issues: If one of the sending interfaces is down, do you report an error or do you ignore it?; If one link has high latency and the other link has high loss, what do you report?

In short, the concept of monitoring multiple targets, and combining their results in a decision making process to determine whether a gateway is "down" is something that has to be implemented in pfSense itself. I'll be happy to look at this in pfSense as time permits, although I believe that Phil has already said that he was going to explore it.

Best,
Denny

Actions #19

Updated by Phillip Davis almost 5 years ago

I agree with this analysis. To make this happen, there needs to be a layer between groups of dpinger process(es) and the upper pfSense code that reacts to gateway state changes. That layer would incorporate whatever rule options are useful for determining the overall combined status from a group of dpinger monitors. The layer would provide a status interface that responds with status data that looks like a single summarized status. Then the existing top-level pfSense code can use the interface of that layer to do gateway failover... (while preserving the ability to get individual gateway status for reporting (gateway widget...) and logging.

In practice that layer could be some dpinger control software that does not have to be written in PHP and does not have to be tightly bound to pfSense, it could be a "package" of code that provides the necessary interface/methods and then pfSense is one of the high-level software products that makes use of it. Or it could just be completely woven into the pfSense base code. That is an architectural design decision.

Actions #20

Updated by → luckman212 almost 5 years ago

I agree that the "right" way to handle this would be to have dpinger remain dumb (for lack of a better term) and simply report its latency stats dutifully. A new "gateway supervisor daemon" needs to be coded, that aggregates these stats and makes informed decisions based on some settings that don't currently exist in pfSense. This would be incredibly powerful.

One way I could envision this working is:

• A new "Gateway Monitoring" page is created under Routing. The layout of this page would look something like the DynDNS page where you could add hosts line by line, and set Gateways, IPs and Descriptions for each.

• We would define 0 or more hosts per gateway: If none are defined, monitoring for that gateway is effectively disabled. Defining just 1 host would be equivalent to the system we have currently. Defining 2 or more would enable the new advanced functionality. Each host added there would start a new instance of dpinger, adding monitoring + graphing for that host via the chosen gateway. These monitors in and of themselves would do NOTHING except log data.

• The Routing > Gateway Groups page could then be changed so that some new popup choices are available for Trigger Level:
  • "ANY monitored host exhibits packet loss"
  • "ALL monitored hosts exhibit packet loss"
  • "ANY monitored host has high latency"
  • "ALL monitored hosts have high latency"
  • "ANY monitored host is completely down"
  • "ALL monitored hosts are completely down"

• The "Gateway supervisor daemon" is the process that triggers pfSense scripts that mark gateways up/down based on the triggers set above and the aggregate stats for the individual dpinger instances.

These are back of napkin ideas but overall this seems like it would work.

Actions #22

Updated by Michael Kellogg almost 5 years ago

definitely like the idea of adding a decision layer the would then open options to create a daemon for other methods of wan failure could be created and implemented in that decision

Actions #23

Updated by Web Dawg over 4 years ago

So I put in a feature request @ the dpinger github here: https://github.com/dennypage/dpinger/issues/24

Here is what I wrote:

Feature request: Multiple Targets To Ping

So the most basic feature to ask for in relation to multiple targets in dpinger I would think is: that if all ips are down then pfsense (dpinger) could mark the connection as down. I have a few devices out in the field and sometimes pings just stop to a target until they are restarted/re routed.

I think also that this should be a feature because of the way broadband ISP's seem to handle pings anymore. Even on a few 'business' connections I will see dropped pings to an ISP gateway, like they just kill the pings or routes that have to do with icmp.

I have had anycast IP's stop pinging too, when a connection is just fine. I think it stems from just crap routing and traffic management setups, where ISP's just do not care about this type of traffic anymore.

I have quite a few dedis and VPS's in the wild but I cannot guarantee 100% uptime on all at the same time, but I can guarantee that at least 8 out of 10 will be up, or 5 out of 10.

I really do think that this is a valid request for a piece of software like this anymore as it is more and more common for small business level internet to have issues with pings. I have had routers in at least 10 different locations across 10 different states, with 10 different ISP's all have issues like this. Time Warner, Spectrum, ATT, Verizon DSL, Verizon 4G, Comcast, FIOS, Brighthouse, Misc WISP, etc etc.

This issue @ pfsense has been open for 6 years now and I know it might not be as glamorous as fast kernel space packet routing but anymore this is a huge problem. I have clients dumping money into the pfsense project via hardware and support purchases only to have unreliable broadband because of this.

I am not 100% in tune to the relationship between dpinger and pfsense but it was suggested by Mr Thompson to come here and ask for something like this. Please help.

Actions #24

Updated by Denny Page over 4 years ago

Dpinger using multiple targets has been discussed. See updates 18-21.

Actions #25

Updated by David Gessel over 3 years ago

At the risk of bumping a closed topic, I have an edge use case that could be considered if a gateway monitoring daemon gets coded: blocked services. In a lot of countries these days, various services are blocked, usually be DNS poisoning, at various times (exams, riots, wars, etc). Ping is likely to work to any IP, but DNS lookups to some (not all necessarily) services will fail.

Testing such connectivity issues is pretty much the equivalent of OONI, perhaps including it as an optional test framework would be a pretty comprehensive way to validate interfaces.

Actions #26

Updated by Mark Noga almost 3 years ago

I agree with David. DNS more so than Ping monitoring makes sense to me. I've been bit a few times with DNS failures by the ISP while ICMP monitoring was happy. Fail-over wouldn't occur as a result. With no DNS the link is effectively down for most uses other than VPN tunnels and such. With no ICMP to a single source, the adapter may be up and functional. Not to mention DNS servers are use to getting hit many x/sec. ICMP targets aren't always as accomodating.

Actions #27

Updated by Web Dawg over 2 years ago

I think that pfSense should use:

ping (ICMP)
https/http
DNS

You should be able to configure these dynamically w/ rules. IE 4 ping targets/one-two https/http target(s), and some dns targets.

DNS is not the answer because I could be using completely different DNS servers. I think we need a hybrid solution.

I do not know why this enhancement is not being taken seriously (I could be wrong, and I am not trying to insult anyone), but pfSense's competitors already have stuff like this. Cable internet connections (QoS) in certain regions are garbage now adays, and something like this is really needed.

Actions #28

Updated by Jim Pingle over 2 years ago

Ultimately it's not seeing any traction because the suggested solution isn't right. Essentially dpinger is only a daemon that pings and reports responses. It doesn't make decisions about what is good or bad for a pfSense gateway as a whole only its specific single target. It isn't up to dpinger to handle multiple targets or different protocols.

What is needed is more like some middleware-ish daemon to sit between pfSense and other gateway monitoring daemons like dpinger (See #7671 for some other suggestions) that would be capable of coordinating multiple monitoring techniques for each gateway and making more informed decisions about their status.

pfSense
|
+--- [gateway monitoring daemon]
                 |
                 + --- [dpinger <1...n>, <something that checks http>, <something that checks tcp>, etc]

There isn't currently a feature request for that, however, but feel free to open one and start a bounty on the forum to see if you get any takers. Given the responses on the dpinger github it appears its author agrees that it's out of scope for dpinger itself.

Actions

Also available in: Atom PDF