Regression #12622
closedKernel panic when using ``fq_pie`` limiter scheduler
0%
Description
When ever i try and use the limiter scheduler fq_pie pfsense crashes with a page fault.
I can recover by disabling the wan port.
This affects the current 2.60 beta.
2.50 and 2.52 work fine.
The crash doesn't occur until you apply traffic to the limiter via a firewall rule.
This occurs on both a kvm virtual machine and apu2c4 hardware.
I have attached a couple of textdumps.
0. Was created by starting up pfsense and letting in boot until it crashes.
1. Was created by having the firewall rules that apply the traffic to the limiter disabled at boot time and then enabling them after boot up.
Files
Updated by Viktor Gurov over 1 year ago
Unable to reproduce on 2.6.0.b.20211220.0600
/tmp/rules.limiter:
pipe 1 config bw 10Mb pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn
/tmp/rules.debug:
... pass in quick on $LAN inet proto tcp from any to 2.2.2.2 ridentifier 1638455473 flags S/SA keep state dnpipe ( 1) label "USER_RULE"
Updated by Viktor Gurov over 1 year ago
- Status changed from New to Feedback
- Affected Version set to 2.6.0
Updated by Jim Pingle over 1 year ago
- Tracker changed from Bug to Regression
- Target version deleted (
2.6.0)
Updated by Harley Peters over 1 year ago
This is the configuration I have it set to.
I also tried with noecn still page faulted.
The very same configuration works fine by just changing the scheduler to fq_codel.
Pfsense version 2.6.0.b.20211220.0600
rules.limiter:
pipe 1 config bw 4800Kb droptail
sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 1 config pipe 1 droptail
pipe 2 config bw 48Mb droptail
sched 2 config pipe 2 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 2 config pipe 2 droptail
rules.debug:
pass quick on { vtnet0 } inet proto icmp from any to any icmp-type { echorep,echoreq } ridentifier 1623355920 keep state label "USER_RULE: (LIMITERS}Ipv4IcmpBugWorkAround"
match out quick on { vtnet0 } inet from any to any ridentifier 1621820692 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv4WanOut"
match in quick on { vtnet0 } inet from any to any ridentifier 1622208712 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv4WanIn"
match out quick on { vtnet0 } inet6 from any to ! fe80::/64 ridentifier 1621820766 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv6WanOut"
match in quick on { vtnet0 } inet6 from ! fe80::/64 to any ridentifier 1622208794 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv6WanIn"
Updated by Harley Peters over 1 year ago
Ok the reason Viktor Gurov setup didn't page fault is he didn't add child queue to the scheduler(Tested).
The problem is without a child queue the fq_pie and fq_codel schedulers don't work.
Updated by Mateusz Guzik about 1 year ago
I diagnosed the problem and wrote a patch for it, but don't heavy easy means to test:
diff --git a/sys/netpfil/ipfw/dn_sched_fq_pie.c b/sys/netpfil/ipfw/dn_sched_fq_pie.c index c3de665687a3..ae14152538bb 100644 --- a/sys/netpfil/ipfw/dn_sched_fq_pie.c +++ b/sys/netpfil/ipfw/dn_sched_fq_pie.c @@ -111,6 +111,9 @@ struct fq_pie_flow { int active; /* 1: flow is active (in a list) */ struct pie_status pst; /* pie status variables */ struct fq_pie_si_extra *psi_extra; +#ifdef VIMAGE + struct vnet *vnet; +#endif STAILQ_ENTRY(fq_pie_flow) flowchain; }; @@ -575,6 +578,7 @@ fqpie_callout_cleanup(void *x) mtx_destroy(&pst->lock_mtx); psi_extra = q->psi_extra; + CURVNET_SET(q->vnet); DN_BH_WLOCK(); psi_extra->nr_active_q--; @@ -585,6 +589,7 @@ fqpie_callout_cleanup(void *x) fq_pie_desc.ref_count--; } DN_BH_WUNLOCK(); + CURVNET_RESTORE(); } /* @@ -1052,6 +1057,9 @@ fq_pie_new_sched(struct dn_sch_inst *_si) for (i = 0; i < schk->cfg.flows_cnt; i++) { flows[i].pst.parms = &schk->cfg.pcfg; flows[i].psi_extra = si->si_extra; +#ifdef VIMAGE + flows[i].vnet = curvnet; +#endif pie_init(&flows[i], schk); }
Will you be able to plop a test kernel into /boot/kernel if I provide you a .tgz with the directory? (and of course test afterwards)
Updated by Harley Peters about 1 year ago
Never done it before but I should be able to. I have a test setup I can run it on.
Updated by Mateusz Guzik about 1 year ago
Apologies for late reply, I somehow did not get notification of your response.
I pushed the patch to pfSense, should be available in the next snapshot so any patching by hand can be avoided. I'll let you know in a day or two when it is available.
Updated by Mateusz Guzik about 1 year ago
This snapshot contains the fix: https://firmware-nyi.netgate.com/beta/snapshots/installer/pfSense-plus-22.01-BETA-amd64-20220111-0600.iso.gz
Updated by Harley Peters about 1 year ago
Ok I don't have Netgate hardware so I won't be able to test any Pfsense plus versions.
Updated by Harley Peters about 1 year ago
I tested the latest community edition 2.6.0.b.20220111.0600 on two different machines and everything is working good.
No problems so far anyway. :)
Updated by Mateusz Guzik about 1 year ago
- Status changed from Feedback to Resolved
- Assignee set to Mateusz Guzik
Thanks for testing. I'll assume the issue is resolved, please reopen if the crash pops up again.
Updated by Jim Pingle about 1 year ago
- Subject changed from When using the limiter scheduler fq_pie Pfsense page faults. to Kernel panic when using ``fq_pie`` limiter scheduler
- Target version set to 2.6.0
- Plus Target Version set to 22.01
Updated by Harley Peters about 1 year ago
I guess i should have checked a little better.
The limiter scheduler fq_pie is indeed fixed but you also need to patch the limiter aqm(Active queue management) pie as well. It is still causing a kernel panic.
Updated by Mateusz Guzik about 1 year ago
Can you attach a dump? Both already attached only show the fq pie crash.
Updated by Harley Peters about 1 year ago
- File info_pie_rules.1 info_pie_rules.1 added
- File textdump_pie_rules.tar.1 textdump_pie_rules.tar.1 added
- File info_pie_reboot.2 info_pie_reboot.2 added
- File textdump_pie_reboot.tar.2 textdump_pie_reboot.tar.2 added
Ok I'm uploading two text dumps.
The first one occured right when I applied the floating firewall rules to pipe the traffic to the limiters.
The second occured during a reboot.
Updated by Mateusz Guzik about 1 year ago
Ok, that's the same kind of problem, but it will have to be fixed differently. I'll try to do it today.
Updated by Mateusz Guzik about 1 year ago
I pushed the fix, should be available in the next snapshot.
Updated by Harley Peters about 1 year ago
I can confirm this is working in the latest 2.70 snapshot.
Updated by Scott Long about 1 year ago
- Status changed from In Progress to Feedback
Updated by Harley Peters about 1 year ago
I am currently running the latest 2.60 release candidate snapshot and everything is working good.
Updated by Jim Pingle about 1 year ago
- Status changed from Feedback to Resolved
Thanks for following up, let us know if you experience any additional instability.