Project

General

Profile

Feature #1189

Gateway: Multiple monitor ips

Added by Irwin Leong over 8 years ago. Updated 2 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Gateway monitoring
Target version:
-
Start date:
01/13/2011
Due date:
% Done:

0%

Estimated time:

Description

Would be my first time making an entry here so if it's not within the rules bear with me thanks.

For gateways to stay up and running, they need to rely on a reliable monitor ip. Most of the time, that's fine, and the very sometimes, it isn't. In my case, me being an incompetent admin if you like, and definitely a more incompetent isp, I had a gateway down for quite a long period and i had no knowledge at all. Yes, full of neglect. After many many years, it would turn out that my isp blocked icmp's to the pppoe gateway. I can't quite depend on pinging their dns servers either, cuz they do go down or have high packet loss from time to time. I do have a couple of monitor ips, but I don't know if they are reliable, but what I don't want is to see my gateways go down when the monitor ip's the one that actually goes down. So perhaps by having 2-3 alternate monitor ips, in case the primary goes down for some funny or no reason, I think that'll help with unnecessary outage.

I don't think many here would care for it, since most of you are from the 1st world while I'm in a 3rd, or rightfully 4th. Shrugs.

Thanks

History

#1 Updated by Max Riedel over 8 years ago

biatche biatche wrote:

I don't think many here would care for it, since most of you are from the 1st world while I'm in a 3rd, or rightfully 4th. Shrugs.

Sorry, probably this belongs to the forum, but I wanted to add that you're not alone.
Can't imagine we are the only ones.. I also think only one target is hardly enough. Always wondered why there's only one option.
Would this take much of work to implement?

#2 Updated by Chris Buechler over 8 years ago

  • Project changed from pfSense Packages to pfSense

#3 Updated by Andreas Heckmann almost 7 years ago

I also would like to see the possibility to add multiple Monitor IPs. This would be a great improvement.

#4 Updated by Bipin Chandra almost 7 years ago

me too would want this as i have had the same issue from long but simply tried to avoid it by disabling the monitor completely

#5 Updated by Florian Schaeffler over 5 years ago

Same here. I actually would love to see this feature, as it gives way more stability to the whole failover-concept.

#6 Updated by Phillip Davis over 5 years ago

I would also like this feature, so I have had a think about how it can be done. The apinger target is just a single IP. The GUI can be enhanced to allow multiple monitor IPs to be entered for a gateway, and for advanced parameters (loss, delay... limits) to be set individually for each target (I am assuming this flexibility is needed?). Then there will be multiple targets monitoring a single WAN/gateway. If nothing further is done, then when a single target generates an alarm, the code to bring-down/fail-over the corresponding interface will get invoked - this stuff from apinger.conf:
## These parameters can be overridden in a specific alarm configuration
alarm default {
command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' "
command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' "
combine 10s
}

Actually, we only want to invoke this when:
a) All targets for an interface have alarmed (this is the condition for the interface to be declared down), or;
b) This is the first target to "unalarm" (so now the interface is deemed to be up again)
The "command on" and "command off" could be changed so they invoke some script that can check for (a) or (b) and that script can then invoke "/usr/local/sbin/pfSctl ..." if required. The checking might involve parsing /var/run/apinger.status to determine which targets are down...

An alternative is to have the various dyndns, ipsecdns and openvpn scripts that get invoked by the "service reload" each do the checking to see if all targets are down or not.

So far I have assumed that everyone will want it to be that "gateway down" is declared only when all monitor IPs are down. I think that is the normal use case. Maybe there are some times when the user wants "gateway down" declared when any 1 of the monitor IPs are down (i.e. gateway is up only if ALL monitor IPs are up). A checkbox could be added so the user could select "Require all monitor IPs to be up for gateway to be up" and the underlying logic could respect that. Does anyone think that would be used?

I can have more of a look at/play with this in a couple of weeks, so please comment on the design issues.

#7 Updated by Florian Schaeffler over 5 years ago

Are advanced parameters for every single monitoring IP actually needed?
In my opinion there is no need for it, as not the quality of different monitoring-routes or endpoints, but the actual uplink is measured. If IP1 has a loss of 5% and IP2 a loss of 10%, we already know that the best route has only 5% loss, which means the actual endpoint of IP2 might be having problems.

Is `Require all monitor IPs to be up for gateway to be up` required?
Again, wouldn`t this measure the quality of the monitoring-routes than rather the uplink itself?

#8 Updated by Chris Buechler over 5 years ago

Probably don't need advanced parameters on a per-monitor IP basis. It would be extremely unusual to have a need for that, so best to avoid the complexity. The biggest part of the job here is adding support for multiple monitor IPs to apinger. I guess that could be handled in PHP logic but not sure that's a good idea.

#9 Updated by Jorge Albarenque about 5 years ago

Think we can get this implemented by 2.2? That would be awesome

#10 Updated by Nome Fasullo about 5 years ago

I searched about this and I'm glad others need that.

I tried several IP but there is no one that can be ALWAYS online. Using the first hop in traceroute is more stable, but can be a risk because sometimes eventually the provider changes the internal routing configuration.

Having instead to say "This connection is offline because apinger could not reach 3 different IP" is a quite sure statement.

Hope this is not difficoult to implement.

#11 Updated by Chris Buechler about 5 years ago

there are plenty of anycasted IPs that are always up. Google DNS, OpenDNS, Level 3's public DNS, among others.

#12 Updated by Eduard Rozenberg almost 5 years ago

Hello Chris,

Due to personal experience I now claim there are 0 always reliable monitor IP's. I just had 8.8.8.8 (Google DNS 1) show erroneous packet loss to my pfSense router. I also read about someone's experience with Level 3 (4.2.2.x) also showing erroneous packet loss in at least one instance. I've also experienced an instance where my next hop ISP router showed as UP but connectivity to the rest of the internet was DOWN meaning that also was not reliable.

I'd suggest multiple monitor IP's per gateway (wan1, wan2 etc) are the only way to guarantee proper monitoring of connectivity to the Internet.

#13 Updated by Phillip Davis over 4 years ago

Given the current state of apinger, it is certainly not worth it to try to enhance the existing apinger compiled code to handle having groups of monitor IPs that it alarms for when all are down. If a whole new equivalent is going to be written "from scratch" to meet the monitoring requirement, then that would be a time to make it "monitor IP groups"...

If I do some work to implement PHP code that analyses the individual alarm events and effectively only performs "down" actions when all monitor IPs in a group are down, would that be acceptable and likely to get committed to the project?

The "up" actions would be similar - when the 1st of a monitor IP group comes up then the "up" actions are taken. When 2nd, 3rd... come up then nothing extra needs to be done, the gateway is already considered up.

#14 Updated by badon _ over 3 years ago

I had a very difficult-to-debug outage on one of my gateways. Ping through gateway worked fine, and it took me a whole day to rip everything apart to figure out what is wrong. It turns out there was absolutely nothing wrong with anything. The gateway went down and stayed down because the monitor IP went down, even though nothing that mattered was actually broken.

If it's going to take some time to resolve this issue, I recommend making a prominent mention of the potential for this problem whenever a gateway goes down due to a monitor IP going down. That will at least save some admins headaches, downtime, and lost sleep, until a proper solution is ready.

See also:

https://redmine.pfsense.org/issues/5661

#15 Updated by Michael Kellogg over 3 years ago

I upvote this too the router i replaced (no name xincom then renamed syswan ) had 3 ways to monitor health check http good for isps that block icmp (yes some are that stupid) they watched traffic flow or packet flow and then there was icmp ping
i would think it sould be possible but maybe only works in a variant of linux

#16 Updated by Phillip Davis over 3 years ago

Now dpinger is in 2.2.3-ALPHA I will have a think about the design of this. The system runs a dpinger for each monitored gateway, so it should not be difficult to get multiple dpinger monitoring out the same gateway. Then there will be the feedback loop to have the alerts work for some user-chosen combination of all monitor IPs up or all monitor IPs down to decide the overall gateway status and feed that through to modifying the gateway groups.

#17 Updated by Chris Buechler over 3 years ago

  • Category set to Gateway monitoring
  • Status changed from New to Duplicate

duplicate of #4354

#18 Updated by Phillip Davis over 3 years ago

https://redmine.pfsense.org/issues/4354 was closed a few days ago. Is this still "a good thing" to do in some way?

#19 Updated by Chris Buechler over 3 years ago

Phillip Davis wrote:

https://redmine.pfsense.org/issues/4354 was closed a few days ago. Is this still "a good thing" to do in some way?

It was closed out for not being worth the effort it'd entail. Running multiple instances of dpinger now isn't hard, but handling gateway status across multiple instances complicates a lot of things without providing much benefit. Not something we're going to pursue at this time.

#20 Updated by Luke Hamburg over 3 years ago

How can we run multiple dpingers to monitor arbitrary IPs and dump results to rrd? Is this written down somewhere or should I try perusing https://github.com/dennypage/dpinger ?

#21 Updated by John Banks over 2 years ago

I would also like to see this as a feature. This one has been open for a while now, and many of the hardware solutions I have in place implement this as-is.

Today I got a call because of intermittent connection loss. Since the ISP gateway does not respond to pings reliably, I've been using google DNS servers. And today google's DNS is having issues responding to pings with spikes in loss to >70% from the client location, which is triggering failover repeatedly. I've temporarily switched to L3's DNS servers for monitoring (and DNS), but have had problems with them in the past as well.

Having multiple addresses to monitor would have make the issue I was called in for a non-issue, and would make my company look better when things don't fail because of a piece of hardware I installed having a conniption fit, especially in a case where it shouldn't be.

#22 Updated by Luke Hamburg over 2 years ago

Don't have a solution (yet) but FYI in case some people are watching this ticket and not the others/forums, I did create a bounty for this and set the prize at $2,500. Open for further donations to push it forward... https://forum.pfsense.org/index.php?topic=123741.0

#23 Updated by Web Dawg about 2 years ago

So I put in a feature request @ the dpinger github here: https://github.com/dennypage/dpinger/issues/24

Here is what I wrote:

Feature request: Multiple Targets To Ping

So the most basic feature to ask for in relation to multiple targets in dpinger I would think is: that if all ips are down then pfsense (dpinger) could mark the connection as down. I have a few devices out in the field and sometimes pings just stop to a target until they are restarted/re routed.

I think also that this should be a feature because of the way broadband ISP's seem to handle pings anymore. Even on a few 'business' connections I will see dropped pings to an ISP gateway, like they just kill the pings or routes that have to do with icmp.

I have had anycast IP's stop pinging too, when a connection is just fine. I think it stems from just crap routing and traffic management setups, where ISP's just do not care about this type of traffic anymore.

I have quite a few dedis and VPS's in the wild but I cannot guarantee 100% uptime on all at the same time, but I can guarantee that at least 8 out of 10 will be up, or 5 out of 10.

I really do think that this is a valid request for a piece of software like this anymore as it is more and more common for small business level internet to have issues with pings. I have had routers in at least 10 different locations across 10 different states, with 10 different ISP's all have issues like this. Time Warner, Spectrum, ATT, Verizon DSL, Verizon 4G, Comcast, FIOS, Brighthouse, Misc WISP, etc etc.

This issue @ pfsense has been open for 6 years now and I know it might not be as glamorous as fast kernel space packet routing but anymore this is a huge problem. I have clients dumping money into the pfsense project via hardware and support purchases only to have unreliable broadband because of this.

I am not 100% in tune to the relationship between dpinger and pfsense but it was suggested by Mr Thompson to come here and ask for something like this. Please help.

#24 Updated by Denny Page about 2 years ago

Hadn't noticed this issue before...

With regard to dpinger itself, please see #4354#note-18

#25 Updated by Blaine Palmer 10 months ago

Just going to add to the chorus here.

We encountered routing issues (their side, whole country affected) with one of our ISPs today that left the gateway and 8.8.8.8 pingable, but nothing to the larger internet.
Nothing failed over because of this, despite 99% of the internet being unavailable.

A single monitor IP is prone to failure in detecting a downtime event. If we could choose multiple monitor IPs, and then include then rank that above tiering would be optimal. Maybe add some options for ranking order.

i.e.

Tier 1 (Monitoring 8.8.8.8, 1.1.1.1, 4.2.2.1)
Tier 2 (Monitoring 8.8.4.4, 1.0.0.1, 4.2.2.2)

If you could choose to rank connectability over tier, such that 8.8.8.8 being unreachable would result in 2/3 connectability vs 3/3 on the tier 2 connection and failover. However, if both were 2/3 (i.e. 8.8.8.8 and 8.8.4.4 is down) then it still chooses Tier 1.

#26 Updated by Rajil Saraswat 10 months ago

Openwrt mwan3 package has multiple monitors which can be tracked, https://wiki.openwrt.org/doc/howto/mwan3#interface_configuration

#27 Updated by Marvin Klose 3 months ago

Yep. I want that too. Just my Parents hadnt Phone, because my PoolDNS IP went down and it switched over to only Internet ISP...

#28 Updated by Stefan B. Christensen 2 months ago

+1
Please consider implementing this. I just experienced my first down time because 1.0.0.1 was unavailable from YouSee's network in Denmark.

Web, mail and ftp all down because the monitor IP was unavailable.

Showing an average of the three on the monitor graph would be fine

Also available in: Atom PDF