Bug #3191
closedQuality RRD inaccuracies and failure to update status in some circumstances
There are some circumstances in which apinger puts in incorrect quality RRD data and fails to notice downtime. This needs some more specifics, but here are known examples of circumstances that fail.
1) use a monitor IP with 80% packet loss, with "Down" at the default 10, and it won't detect any packet loss and graphs no packet loss.
2) use a monitor IP with 80% packet loss, with "Down" at 30, and it'll take the gateway down appropriately but the RRD data will be wrong. Shows 100% loss much of the time.
There may be other specific circumstances that need to be added here.
This is true across all versions that have ever used apinger.
Updated by Ermal Luçi almost 11 years ago
- Target version changed from 2.1.1 to 2.2
I am not sure this is something to be fixed for 2.1.1 so putting to 2.2.
Updated by Jim Thompson over 10 years ago
- Assignee set to Chris Buechler
need more info on 'when' this happens (why would be great)
Updated by Chris Buechler over 10 years ago
- Assignee changed from Chris Buechler to Ermal Luçi
There are a few descriptions of problems in tickets in Kayako under the apinger-badstats tag.
Updated by Ermal Luçi over 10 years ago
- Status changed from New to Feedback
Patched apinger, need some feedback if the issue is solved now.
Updated by Ermal Luçi over 10 years ago
For the record, properly recover from disconnected sockets patch put in.
Updated by Chris Buechler over 10 years ago
- Status changed from Feedback to New
The first issue as noted originally is still a problem as described. Throw a limiter on an upstream system that drops 80% of the monitor pings, and the packet loss and latency reported by apinger are completely wrong. Reports somewhere around 3 ms with 0% loss when in reality, reading back a pcap with tshark:
ICMP Service Response Time (SRT) Statistics (all times in ms): Filter: ip.addr== Requests Replies Lost % Loss 38 9 29 76.3% Minimum Maximum Mean Median SDeviation Min Frame Max Frame 90.927 94.974 92.275 92.089 1.379 54 34
Its pings are getting replies in ~92-94 ms (not the 3 it's showing), and the loss doesn't show up at all.
Change "Down" to 30, and the RRD data is correct for loss, but not latency.
Updated by Chris Buechler about 10 years ago
- Assignee changed from Ermal Luçi to Chris Buechler
to me to re-test
Updated by Chris Buechler about 10 years ago
- Status changed from New to Feedback
- Target version changed from 2.2 to 2.3
things are much better with apinger in general after fixes in the past 1-2 months. I can still replicate some issues here, but it's unlikely they're ones that would occur in the real world. Given things are better than in current stable releases as is, and touching apinger is fraught with peril, I'm setting this out to 2.3 for feedback and further review.
Updated by Michael Kellogg about 9 years ago
things are still an issue on troublesome connections
Updated by Chris Buechler about 9 years ago
- Category changed from Gateways to Gateway Monitoring
- Status changed from Feedback to Confirmed
Updated by Chris Buechler about 9 years ago
- Status changed from Confirmed to Resolved
this was resolved by replacing apinger