Project

General

Profile

Bug #3191

Quality RRD inaccuracies and failure to update status in some circumstances

Added by Chris Buechler almost 6 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
Gateway monitoring
Target version:
Start date:
09/09/2013
Due date:
% Done:

0%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

There are some circumstances in which apinger puts in incorrect quality RRD data and fails to notice downtime. This needs some more specifics, but here are known examples of circumstances that fail.

1) use a monitor IP with 80% packet loss, with "Down" at the default 10, and it won't detect any packet loss and graphs no packet loss.
2) use a monitor IP with 80% packet loss, with "Down" at 30, and it'll take the gateway down appropriately but the RRD data will be wrong. Shows 100% loss much of the time.

There may be other specific circumstances that need to be added here.

This is true across all versions that have ever used apinger.

History

#1 Updated by Ermal Luçi over 5 years ago

  • Target version changed from 2.1.1 to 2.2

I am not sure this is something to be fixed for 2.1.1 so putting to 2.2.

#2 Updated by Jim Thompson almost 5 years ago

  • Assignee set to Chris Buechler

need more info on 'when' this happens (why would be great)

#3 Updated by Chris Buechler almost 5 years ago

  • Assignee changed from Chris Buechler to Ermal Luçi

There are a few descriptions of problems in tickets in Kayako under the apinger-badstats tag.

#4 Updated by Ermal Luçi almost 5 years ago

  • Status changed from New to Feedback

Patched apinger, need some feedback if the issue is solved now.

#5 Updated by Ermal Luçi almost 5 years ago

For the record, properly recover from disconnected sockets patch put in.

#6 Updated by Chris Buechler almost 5 years ago

  • Status changed from Feedback to New

The first issue as noted originally is still a problem as described. Throw a limiter on an upstream system that drops 80% of the monitor pings, and the packet loss and latency reported by apinger are completely wrong. Reports somewhere around 3 ms with 0% loss when in reality, reading back a pcap with tshark:

ICMP Service Response Time (SRT) Statistics (all times in ms):
Filter: ip.addr==192.0.2.202

Requests  Replies   Lost      % Loss
38        9         29         76.3%

Minimum   Maximum   Mean      Median    SDeviation     Min Frame Max Frame
90.927    94.974    92.275    92.089    1.379          54        34        

Its pings are getting replies in ~92-94 ms (not the 3 it's showing), and the loss doesn't show up at all.

Change "Down" to 30, and the RRD data is correct for loss, but not latency.

#7 Updated by Chris Buechler over 4 years ago

  • Affected Documentation 1 added

#8 Updated by Chris Buechler over 4 years ago

  • Assignee changed from Ermal Luçi to Chris Buechler

to me to re-test

#9 Updated by Chris Buechler over 4 years ago

  • Status changed from New to Feedback
  • Target version changed from 2.2 to 2.3
  • Affected Documentation 0 added
  • Affected Documentation deleted (1)

things are much better with apinger in general after fixes in the past 1-2 months. I can still replicate some issues here, but it's unlikely they're ones that would occur in the real world. Given things are better than in current stable releases as is, and touching apinger is fraught with peril, I'm setting this out to 2.3 for feedback and further review.

#10 Updated by Michael Kellogg over 3 years ago

things are still an issue on troublesome connections

#11 Updated by Chris Buechler over 3 years ago

  • Category changed from Gateways to Gateway monitoring
  • Status changed from Feedback to Confirmed

#12 Updated by Chris Buechler over 3 years ago

  • Status changed from Confirmed to Resolved

this was resolved by replacing apinger

Also available in: Atom PDF