nat + a limiter + fq_codel dropping near all ping traffic under load
I think https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/595 we have confirmed an issue still exists with this.
It's a very long thread.
bug looks similar but not identical to https://redmine.pfsense.org/issues/4326
Updated by Dave taht almost 3 years ago
ok, so we just have a configuration guideline then: "Always put all traffic through the limiter". Do you have a conf that works for https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/570 ?
Updated by Josh Chilcott almost 3 years ago
The conf attached to the example https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/570 shows that the match rules include all protocols for IPv4. The issue presents itself when match out limiter rules are used on interfaces creating NAT states (ex: WAN). Loading the out limiter to capacity, using a match rule on WAN and testing for roughly 60 seconds which includes ramp up and ramp down, showed an 82% loss of successful pings to hosts on the WAN side. During heavy saturation of the limiter almost all ping is lost. Disabling outgoing NAT remedies the situation. Creating in/out limiters on just the LAN side remedies the situation - this appears to be the most performant workaround for single WAN single LAN setup where you have traffic originating on the WAN and LAN side.
Updated by Steven Brown almost 3 years ago
I can confirm this bug. My testing seemed to show that the behaviour was the same no matter which scheduler I assigned to the limiter when the limiter was applied using floating rules. Using a LAN interface firewall rule no longer dropped the pings when fq_codel was assigned.
I had the rules assigned for "all traffic" so this did not fix the issue for me.
Updated by Josh Chilcott over 2 years ago
Using limiters on an interface, with outgoing NAT enabled, causes all ICMP echo reply packets to drop, coming back into WAN, when the limiter is loaded with flows. I can reproduce this issue with the following configuration:
- limiters created (any scheduler). One limiter for out and one limiter for in.
- create a single child queue for the out limiter and one for the in limiter.
- floating match IPv4 any rule on WAN Out, using the out limiter child queue for in and in limiter child queue for out.
- floating match IPV4 any rule on WAN In, using the in limiter child queue for in and out limiter child queue for out.
- load the limiter with traffic. (Most recently I've been using a netperf netserver v2.6.0 on the WAN side and a Flent client on the LAN side running RRUL test)
- start a constant ping from the client to the server during the RRUL test.
Both the flent.gz output and the constant ping will show a high rate of ICMP packets getting dropped. If a separate floating match rule is created for ICMP, then packets will not be dropped. Pushing less pps through pfSense seems to net fewer dropped echo replies.