Quality RRD inaccuracies and failure to update status in some circumstances
There are some circumstances in which apinger puts in incorrect quality RRD data and fails to notice downtime. This needs some more specifics, but here are known examples of circumstances that fail.
1) use a monitor IP with 80% packet loss, with "Down" at the default 10, and it won't detect any packet loss and graphs no packet loss.
2) use a monitor IP with 80% packet loss, with "Down" at 30, and it'll take the gateway down appropriately but the RRD data will be wrong. Shows 100% loss much of the time.
There may be other specific circumstances that need to be added here.
This is true across all versions that have ever used apinger.
Updated by Chris Buechler about 7 years ago
- Status changed from Feedback to New
The first issue as noted originally is still a problem as described. Throw a limiter on an upstream system that drops 80% of the monitor pings, and the packet loss and latency reported by apinger are completely wrong. Reports somewhere around 3 ms with 0% loss when in reality, reading back a pcap with tshark:
ICMP Service Response Time (SRT) Statistics (all times in ms): Filter: ip.addr==192.0.2.202 Requests Replies Lost % Loss 38 9 29 76.3% Minimum Maximum Mean Median SDeviation Min Frame Max Frame 90.927 94.974 92.275 92.089 1.379 54 34
Its pings are getting replies in ~92-94 ms (not the 3 it's showing), and the loss doesn't show up at all.
Change "Down" to 30, and the RRD data is correct for loss, but not latency.
Updated by Chris Buechler almost 7 years ago
- Status changed from New to Feedback
- Target version changed from 2.2 to 2.3
things are much better with apinger in general after fixes in the past 1-2 months. I can still replicate some issues here, but it's unlikely they're ones that would occur in the real world. Given things are better than in current stable releases as is, and touching apinger is fraught with peril, I'm setting this out to 2.3 for feedback and further review.