Bug #13996
openLimiters using the fq_pie scheduler no longer pass any traffic.
Added by Anonymous almost 2 years ago. Updated 4 months ago.
0%
Description
After updating to 23.01 limiters using the fq_pie scheduler no longer pass any traffic.
When using the same floating firewall rules if i change to fq_codel traffic flows as normal.
The same rules and limiters worked fine under 22.05.
Files
FreeBSD-15-CURRENT_kernel-panic.png (459 KB) FreeBSD-15-CURRENT_kernel-panic.png | kernel panic after ping to 1.1.1.1 | Thomas Kupper, 07/21/2024 06:44 AM |
Updated by Chris W over 1 year ago
I'm unable to reproduce this on a virtual machine which was upgraded to 23.01 from 22.05 (and to 22.05 from 22.01 previously). Steps I'm taking are:
1. Create a new limiter using the FQ_PIE scheduler.
2. Create a floating rule (in my case, for the LAN interface) which passes any traffic and uses th new limiter on the In pipe.
3. Disable the default allow any rule on LAN.
4. Install the iperf package in pfSense and start it as the server with the default settings.
5. Use another VM as the iperf client pointed at the pfSense LAN's IP address for the iperf server (iperf3 -c 192.168.1.1 -p 5201
).
Using FQ_PIE, FQ_CODEL, and the worst-case weighted fair, iperf traffic was limited to the 100 Mb/s cap I set in the limiter. Disabling the limiter in the floating rule of course restores full bandwidth.
Are you on your own hardware? If not, I suggest contacting TAC for a 23.01 firmware image to reflash directly to that version. If you're on your own machine, it's now possible to upgrade directly to Plus 23.01 from pfSense CE 2.6, so 22.05 isn't in the picture.
In both cases, I suggest getting the system to 23.01 before importing your configuration file. If the reflash/upgrade is successful, test the limiter using the steps above (doesn't have to be a virutal machine). If that's successful,then import your configuration and test again. Marking as Not a Bug for now.
Updated by Anonymous over 1 year ago
I am not the only one with the problem: https://forum.netgate.com/topic/177555/fq_pie-no-internet?_=1680451711804
I create the new limiters using the FQ_PIE scheduler and Taildrop queue.
I use the same floating rules that i use with FQ_CODEL which works fine with them.
I use separate rules for both in and out using the WAN interface
On the in rule i use action match and quick selected interface set to wan and direction set to in under in/out pipe i have qPIE_GATEWAY_IN for in and qPIE_GATEWAY_OUT for out.
On the out rule i use action match and quick selected interface set to wan and direction set to out under in/out pipe i have
qPIE_GATEWAY_OUT for in and qPIE_GATEWAY_IN for out.
After applying the changes the internet dies and stays dead even after a reboot.
I then change the floating rules back to using FQ_CODEL via qCODEL_GATEWAY_IN an qCODEL_GATEWAY_OUT and everything works normal again.
These same rules and limiters worked fine under 22.05.
This is all on hardware(not netgate) not a virtual machine.
.
Updated by Jim Pingle over 1 year ago
- Has duplicate Bug #14259: Limiters with the fq_pie scheduler don't pass any traffic. added
Updated by Jim Pingle over 1 year ago
- Project changed from pfSense Plus to pfSense
- Category changed from Traffic Shaper (Limiters) to Traffic Shaper (Limiters)
- Status changed from Not a Bug to New
- Affected Plus Version deleted (
23.01)
Updated by Jordan G over 1 year ago
I can confirm, I'm seeing this on 23.05.1 - if nothing else but the scheduler changes from FQ_CODEL to FQ_PIE under the upload/download limiters created, after saving and applying the traffic shaped interface/gateway sees progressive packet loss until it reaches 100% where it remains until the rules are disabled or scheduler reverted to FQ_CODEL. LMK if you have issues reproducing or need config/diag
Updated by Chris Collins 7 months ago
Confirm the problem, it was working, I then adjusted the quantume, then traffic started going into blackhole, I changed the quantum back to what it was 1514 default, still blackhole, rebooted, still black hole, change to FIFO, works, change back to fq_pie blackhole.
Updated by Thomas Kupper 4 months ago
I can reproduce the issue on pfSense CE 2.7.2 and pfSense+ 24.8-DEVELOPMENT (on VM and Hardware), and on FreeBSD 14 and FreeBSD 15 (on VMs). It does work in pfSense CE 2.6.0.
Steps to reproduce on pfSense:
- fresh pfSense CE 2.7.2 installation (VM on Proxmox using the legacy ISO)
- create (FQ_CODEL) limiter according to the pfSense documentation "Configuring CoDel Limiters for Bufferbloat"
- confirm the limiter is working
- change the scheduler for the download and upload pipe to "FQ_PIE"
Result: no traffic on the WAN interface
If I set the "In / Out Pipe" in the floating rule to the up/dowbload pipes instead of the queues, WAN traffic works again as expected.
Testing on FreeBSD 14-RELEASE (and 14.1-RELEASE) and FreeBSD 15-CURRENT with a minimal pf and dnctl showed the same issue. The dnctl and pf rules are based on the pfSense CE 2.7.2 rules (/tmp/rules.limiter and /tmp/rules.debug)
The dnctl rules used
dnctl pipe 1 config bw 88Mb droptail
dnctl sched 1 config pipe 1 type fq_pie
dnctl queue 1 config pipe 1 droptail
dnctl pipe 2 config bw 44Mb droptail
dnctl sched 2 config pipe 2 type fq_pie
dnctl queue 2 config pipe 2 droptail
And the pf.conf
pass out log quick on { vtnet0 } route-to ( vtnet0 192.168.1.1 ) inet from any to any keep state dnqueue( 2,1 )
Replacing dnqueue ( 2,1 )
with dnpipe ( 2,1 )
brings WAN traffic back like on pfSense.
Since I'm not much of a progammer my debug strategy is trial-and-error.
On FreeBSD 15-CURRENT I reverted the commit "3f3e4f3c74726 Kristof Provost dummynet: don't use per-vnet locks to protect global data." since it was minimal code-line wise and not long ago.
After compiling the kernel (GENERIC config copied) and boot with it (in a VM) and changing the scheduler back to FQ_PIE it generates a kernel panic in fqpie_callout_cleanup() as soon any traffic is generated (a ping is enough or the open SSH connection to the VM).
See screenshot for the kernel panic details, I wasn't able to get a kernel dump.
Edit: the kernel panic from dmesg:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff82a25c60
stack pointer = 0x28:0xfffffe0091126df0
frame pointer = 0x28:0xfffffe0091126e10
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 2 (clock (0))
rdi: ffffffff81c0afe0 rsi: 0000000000000008 rdx: ffffffff8114ab30
rcx: fffff800038c5000 r8: 0000000000000000 r9: 0000000000000000
rax: 0000000000000000 rbx: ffffffff8198e908 rbp: fffffe0091126e10
r10: 0000000000010000 r11: 0000000000000001 r12: fffffe00c7d9b0b8
r13: 0000000000000000 r14: fffff800036a5e50 r15: fffff800038c5000
trap number = 12
panic: page fault
cpuid = 1
time = 1721544199
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091126ac0
vpanic() at vpanic+0x13f/frame 0xfffffe0091126bf0
panic() at panic+0x43/frame 0xfffffe0091126c50
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0091126cb0
trap_pfault() at trap_pfault+0xa0/frame 0xfffffe0091126d20
calltrap() at calltrap+0x8/frame 0xfffffe0091126d20
--- trap 0xc, rip = 0xffffffff82a25c60, rsp = 0xfffffe0091126df0, rbp = 0xfffffe0091126e10 ---
fqpie_callout_cleanup() at fqpie_callout_cleanup+0x50/frame 0xfffffe0091126e10
softclock_call_cc() at softclock_call_cc+0x139/frame 0xfffffe0091126ec0
softclock_thread() at softclock_thread+0xc6/frame 0xfffffe0091126ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe0091126f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0091126f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic