Bug #11192
openUsing Limiters causes out of order packets within one TCP or UDP flow
0%
Description
I am using following limiters:
pipe 1 config bw 85Mb queue 2000 mask all droptail
sched 1 config pipe 1 type qfq
queue 1 config pipe 1 queue 2000 mask all codel target 20ms interval 200ms ecn
pipe 2 config bw 85Mb queue 2000 mask all droptail
sched 2 config pipe 2 type qfq
queue 2 config pipe 2 queue 2000 mask all codel target 20ms interval 200ms ecn
(to get "mask all" option i have patched shaper.inc, but problem is reproducible with default shaper.inc also)
And using this rule:
match in on { ovpns8 } inet from 192.168.8.0/22 to 192.168.8.0/22 tracker 1608854657 dnqueue( 2,1) label "USER_RULE: Shape VPN Traffic"
As a result - I get a lot out-of-order packets in TCP / UDP streams when this rule is applied, and no reorder when firewall rule is turned off.
I have tried different types of queue management and schedulers, result is always the same.
I have also tried to limit network to 1 thread, net.isr.maxthreads=1 , no success.
Disabling net.inet.ip.dummynet.io_fast (via patching shaper.inc) gives no result too.
Attaching wireshark screenshow showing typical TCP out-of-order packets block (for 50-60 Mbit stream).
(There are hundreds of such blocks within all capture, but complete wireshark capture is 40 Megabytes for 6 seconds, so I can not attach it to ticket.)
Also attaching output of iperf3 with OUT OF ORDER errors for UDP 30 Mbit stream.
If i raise pipe bandwidth to 185Mb then probability of out-of-order becomes higher for the same traffic.
Files
Updated by Alexey Ab almost 4 years ago
Forget to mention: I am using VMWare workstation 15.5, 2 core PFsense VM with em adapters.
Updated by Jim Pingle almost 4 years ago
- Status changed from New to Feedback
Have you only tested this on pfSense 2.4.5?
Can you try again on a 2.5.0 development snapshot?
Updated by Alexey Ab almost 4 years ago
Update:
I've tested different pipe bandwidth and same 50 mbit traffic:
85 Mbit pipe - less reorder
185 Mbit pipe - more reorder
600 Mbit pipe - less reorder
1000 Mbit pipe - no reorder
When 85 Mbit pipe is saturated with traffic there is no reorder.
No, I did not test it on 2.5 so far.
Can you try reproduce this problem and fix it in release version?
Updated by Alexey Ab almost 4 years ago
I have tested 2.4.2, 2.4.5p1, 2.5 - all versions have this problem.
Setting kernel.hz=1000 instead of 100 does not fix it too.
Packet reordering makes PFSense shaping unusable because it degrades TCP performance from 85 Mbit to 20-30, and produce other errors.
If TCP session get full speed, then all working good. But if speed drops for some reason, TCP can not restore it due to out-of-order packets.
Updated by Alexey Ab almost 4 years ago
Adding 10 ms delay to the pipe seems to fix reordering.
Trying to set both kernel.hz=1000 and delay=1 ms to make a workaround lead to crashes under load.
Updated by Thomas Pilgaard almost 4 years ago
Observed the same on 2.4.5 p1 with out of order packets during iperf testing using fq-codel with limiters set to 930 Mb/s. Tested it on a Supermicro X10SDV-4C-TLN2F without any packages installed and wasn't seeing any particular high load on it either.
Updated by Alexey Ab almost 4 years ago
Since net.inet.ip.dummynet.io_fast does split path of packets for saturated/unsaturated pipe mode, then this setting is likely to be responsible for packet reordering. (traffic is very bursty for TCP without pacing or IPERF3 UDP test, so saturation/desaturation of pipe occurs several times in one second, so it seems then we get reorders on every transition)
But setting of net.inet.ip.dummynet.io_fast=0 has no effect, net.inet.ip.dummynet.io_pkt_fast is still increasing. Explanation is very simple:
io_fast check is commented in dummynet source code:
if (/*dn_cfg.io_fast &&*/ m *m0 && (dir & PROTO_LAYER2) 0 ) {
https://github.com/luigirizzo/dummynet/blob/master/sys/netinet/ipfw/ip_dn_io.c
Updated by Alexey Ab almost 4 years ago
And the same commented code in pfsense repository.
https://github.com/pfsense/FreeBSD-src/blob/devel-12/sys/netpfil/ipfw/ip_dn_io.c
Updated by Alexey Ab almost 4 years ago
I have tried to disable whole if (/*dn_cfg.io_fast */ && ...) via patching /boot/kernel/dummynet.ko .
Traffic then goes only to net.inet.ip.dummynet.io_pkt, net.inet.ip.dummynet.io_pkt_fast always stays zero.
But whole pfsense hangs after several seconds.
There is a comment before this if block:
"XXX Don't call dummynet_send() if scheduler return the packet just enqueued. This avoid a lock order reversal."
This seems to be a cause of hang, and it is unclear how to turn off io_fast correctly.
I mentioned in previous post, adding 10 ms lag to pipe seems to solve the problem. This can be be explained this way: if lag is set, then it
actually disables io_fast, because dummynet redirect all packets to delay queue and does not perform fast io, so no reorder occurs.
Now I need an advice.
Updated by Alexey Ab almost 4 years ago
I've succesfully used kernel.hz=1000 and limiter delay=1ms as workaround to fix this problem.
I've also posted message to freebsd forum: https://forums.freebsd.org/threads/possible-race-condition-bug-in-dummynet-out-of-order-packets.78312/
So far I leave this problem for pfSense team, FreeBSD community, or anyone who want to help create proper fix. As I can see, dummynet is not maintained for a long time by the authors.
Updated by Azamat Khakimyanov over 1 year ago
- Status changed from Feedback to Rejected
Tested on 2.5 CE but I wasn't able to reproduce this issue.
I used KVM with em NICs and I created RA OpenVPN server, then I connected 2 Ubuntu VM hosts to OpenVPN server and use one of these hosts to generate constant 50Mbps traffic by using iperf3 ( -u -b 50M keys) and I used second host to generate big (2000 bytes) TCP packets by using hping3 utility ( hping3 <OpenVPN Server> -d 2000) trying to check if there are any issue with fragmented packets. I saw no issue.
So with different Limiters applied on OpenVPN (85Mbps, then 185 Mbps) I didn't see any issue with fragmented packets.
So it should be tested on latest 2.7 CE and 23.05_1 and if you still see this issue, please describe how to reproduce this issue.
Updated by Alexey Ab over 1 year ago
There was nothing regarding fragmented packets in my bug report.
Updated by Marcos M over 1 year ago
- Status changed from Rejected to Feedback
It would be useful to know if this is reproducible on CE 2.7 (or preferably 23.09 dev) given the major OS version bump since 2.6.
Updated by Alexey Ab over 1 year ago
I've spent two weeks of my working time to debug this problem, find root cause, find workaround, and write complete report to you. I've solved my problem, and not able to spend more time on testing, but I think nothing has changed, since dummynet is not maintained.
By the coincidence, there was also a problem with fragmented packets when using floating rules and dummynet to shape traffic. Large packets are not reassembled and lost (on PFSense 2.4.5). If I turn off floating rules, then all works fine. It was not reported.
Updated by Marcos M about 1 year ago
- Status changed from Feedback to New
Thank you - it's a good analysis! Since this is more of a FreeBSD issue than a pfSense one, reporting this upstream would be best. If you do, please reference the link here.
To summarize the workaround on pfSense:- If on a VM, set
kern.hz="1000"
in/boot/loader.conf.local
, otherwise the delay will round up to 10ms (the default on VMs is100
). - Set
1
for the limiter pipeDelay
option in the GUI.
Updated by P L about 1 year ago
Marcos M wrote in #note-15:
Thank you - it's a good analysis! Since this is more of a FreeBSD issue than a pfSense one, reporting this upstream would be best. If you do, please reference the link here.
To summarize the workaround on pfSense:
- If on a VM, set
kern.hz="1000"
in/boot/loader.conf.local
; the default on VMs is100
.- Set
1
for the limiter pipeDelay
option in the GUI.
Setting the delay on the limiter to 1ms and net.inet.ip.dummynet.io_fast="0" forced all traffic into io_pkt instead of io_pkt_fast for me in pfSense.
I have been troubleshooting fq_codel since early 2.6 pfSense. I'm now using the official layer 2 ethernet firewall rule setup for the AT&T bypass in 23.05.1. I play a lot of Nintendo, and the codel pipe drops or collapses randomly. I've tried the official "one-way" rule here, https://docs.netgate.com/pfsense/en/latest/recipes/codel-limiters.html
I first wanted to try disabling the fast IO because of what I read here.
https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324422
Also, over time, net.inet.ip.dummynet.tick_delta_sum accumulates to over -900. I find that once it reaches -300 games have chances of performing worse even with an A+ bufferbloat. TCP duplications and out of orders show up in pcaps. Turning off and on dummynet expiration resets this ticker. And a delay of 1ms seems to fix the problem.
Updated by P L 11 months ago
Recently I switched to the wpa_supplicant bypass method in pfSense and was still getting out of order packet issues unless I applied delay to fq_codel, which TCP and some videogames don't particularly appreciate. Interestingly, I also saw stability if I used FiFo on a LAN uplink, fq_codel on a LAN downlink, fq_codel on a WAN uplink and FiFo on a WAN Downlink (sometimes I have to pray the traffic doesn't use the fifo with that setup). fq_codel offers marvelous performance in videogames, but I constantly find myself having to set up in-line suricata to block bad TCP window updates, bad UDP checksums and application layer packets going in the wrong direction, and possibly sometimes HTTP requests getting sent to internal devices or right back to the device that sent it, resulting in suricata "HTTP response doesn't match request" errors.
I think all of these issues are because fq_codel, qdisc and dummynet are designed to be used on a LAN's downlink and a WAN's uplink, similar to ALTQ. If both fq_codel AQMs are placed on the WAN, the traffic still gets jumbled in the LAN interface's lack of queuing discipline. And vice versa with both on LAN. You cannot just choose a down/inbound limiter in the pfSense gui, it forces you to choose an uplink on each interface for statefulness reasons (traffic creates states outbound on the LAN and then again outbound on WAN.);
Other routing software seems to allow you to place fq_codel down on LAN and up on WAN. Here is a case example I can pull up right now, and there are a few others online: https://www.b1c1l1.com/blog/2020/03/26/linux-home-router-traffic-shaping-with-fq_codel/
"The simplest implementation is to create a single HTB bucket with FQ_CoDel on the "WAN" interface . . . If download shaping is required, you can use a similar configuration on the "LAN" interface connected to your home devices."
To be honest, I never have issues with my upload on fiber, just my download, but cannot choose to only shape my downloads with fq codel in pfSense. Thanks for your time.
Updated by Marcos M 8 months ago
It may be that due to the way dummynet works, packets will inevitably arrive out of order. Dummynet will let packets through directly until the limit is hit. Limited packets are then handled later, and that may be enough to bring things back under the limit, which lets the packet through directly, so you end up out-of-order.
Updated by Chris Collins 8 months ago
I think I may have been affected by this.
I have used limiters in two scenario, one to make my home broadband not collapse whilst downloading.
The other to manage a busy server behind pfSense when it has upload saturation.
On the server I found under certain loads throughput would collapse on a per user basis but the overall throughput would be very high, this made me think lots of retransmits are happening which might be explained by out of order packets. If I disabled shaping, the quality metrics of the interface went to pot but individual users got much better performance.
On my broadband I needed to tame the download, so e.g. downloading a steam game would prevent me watching a VOD or stream at same time, as the download took over the connection, fq_codel failed miserably. I am not sure this one would be out of order related, but what I did discover is if I didnt utilise the queue system, so just used a pipe, it behaved much better, I had to use basic codel as fq_codel cannot be used when dummynet is used in this way. Even basic FIFO makes fq_codel look the fool in this configuration. Now on FTTP things are much better and I have no shaping but am considering some downstream shaping as its still a little problematic, and adding a delay of 10ms if it forces the fast io to not be used might be worth it.
Sadly the state of fast i/o bypass being commented out as a lazy way to fix a bug, and then left like that for years is not unknown for FreeBSD now days.
Also taken note of P L's findings as well, thank you.