Bug #3191
closed
Quality RRD inaccuracies and failure to update status in some circumstances
Added by Chris Buechler about 11 years ago.
Updated almost 9 years ago.
Category:
Gateway Monitoring
Description
There are some circumstances in which apinger puts in incorrect quality RRD data and fails to notice downtime. This needs some more specifics, but here are known examples of circumstances that fail.
1) use a monitor IP with 80% packet loss, with "Down" at the default 10, and it won't detect any packet loss and graphs no packet loss.
2) use a monitor IP with 80% packet loss, with "Down" at 30, and it'll take the gateway down appropriately but the RRD data will be wrong. Shows 100% loss much of the time.
There may be other specific circumstances that need to be added here.
This is true across all versions that have ever used apinger.
- Target version changed from 2.1.1 to 2.2
I am not sure this is something to be fixed for 2.1.1 so putting to 2.2.
- Assignee set to Chris Buechler
need more info on 'when' this happens (why would be great)
- Assignee changed from Chris Buechler to Ermal Luçi
There are a few descriptions of problems in tickets in Kayako under the apinger-badstats tag.
- Status changed from New to Feedback
Patched apinger, need some feedback if the issue is solved now.
For the record, properly recover from disconnected sockets patch put in.
- Status changed from Feedback to New
The first issue as noted originally is still a problem as described. Throw a limiter on an upstream system that drops 80% of the monitor pings, and the packet loss and latency reported by apinger are completely wrong. Reports somewhere around 3 ms with 0% loss when in reality, reading back a pcap with tshark:
ICMP Service Response Time (SRT) Statistics (all times in ms):
Filter: ip.addr==192.0.2.202
Requests Replies Lost % Loss
38 9 29 76.3%
Minimum Maximum Mean Median SDeviation Min Frame Max Frame
90.927 94.974 92.275 92.089 1.379 54 34
Its pings are getting replies in ~92-94 ms (not the 3 it's showing), and the loss doesn't show up at all.
Change "Down" to 30, and the RRD data is correct for loss, but not latency.
- Assignee changed from Ermal Luçi to Chris Buechler
- Status changed from New to Feedback
- Target version changed from 2.2 to 2.3
things are much better with apinger in general after fixes in the past 1-2 months. I can still replicate some issues here, but it's unlikely they're ones that would occur in the real world. Given things are better than in current stable releases as is, and touching apinger is fraught with peril, I'm setting this out to 2.3 for feedback and further review.
things are still an issue on troublesome connections
- Category changed from Gateways to Gateway Monitoring
- Status changed from Feedback to Confirmed
- Status changed from Confirmed to Resolved
this was resolved by replacing apinger
Also available in: Atom
PDF