Regression #12622
closedKernel panic when using ``fq_pie`` limiter scheduler
0%
Description
When ever i try and use the limiter scheduler fq_pie pfsense crashes with a page fault.
I can recover by disabling the wan port.
This affects the current 2.60 beta.
2.50 and 2.52 work fine.
The crash doesn't occur until you apply traffic to the limiter via a firewall rule.
This occurs on both a kvm virtual machine and apu2c4 hardware.
I have attached a couple of textdumps.
0. Was created by starting up pfsense and letting in boot until it crashes.
1. Was created by having the firewall rules that apply the traffic to the limiter disabled at boot time and then enabling them after boot up.
Files
Updated by Viktor Gurov almost 3 years ago
Unable to reproduce on 2.6.0.b.20211220.0600
/tmp/rules.limiter:
pipe 1 config bw 10Mb pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn
/tmp/rules.debug:
... pass in quick on $LAN inet proto tcp from any to 2.2.2.2 ridentifier 1638455473 flags S/SA keep state dnpipe ( 1) label "USER_RULE"
Updated by Viktor Gurov almost 3 years ago
- Status changed from New to Feedback
- Affected Version set to 2.6.0
Updated by Jim Pingle almost 3 years ago
- Tracker changed from Bug to Regression
- Target version deleted (
2.6.0)
Updated by Anonymous almost 3 years ago
This is the configuration I have it set to.
I also tried with noecn still page faulted.
The very same configuration works fine by just changing the scheduler to fq_codel.
Pfsense version 2.6.0.b.20211220.0600
rules.limiter:
pipe 1 config bw 4800Kb droptail
sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 1 config pipe 1 droptail
pipe 2 config bw 48Mb droptail
sched 2 config pipe 2 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 2 config pipe 2 droptail
rules.debug:
pass quick on { vtnet0 } inet proto icmp from any to any icmp-type { echorep,echoreq } ridentifier 1623355920 keep state label "USER_RULE: (LIMITERS}Ipv4IcmpBugWorkAround"
match out quick on { vtnet0 } inet from any to any ridentifier 1621820692 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv4WanOut"
match in quick on { vtnet0 } inet from any to any ridentifier 1622208712 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv4WanIn"
match out quick on { vtnet0 } inet6 from any to ! fe80::/64 ridentifier 1621820766 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv6WanOut"
match in quick on { vtnet0 } inet6 from ! fe80::/64 to any ridentifier 1622208794 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv6WanIn"
Updated by Anonymous almost 3 years ago
Ok the reason Viktor Gurov setup didn't page fault is he didn't add child queue to the scheduler(Tested).
The problem is without a child queue the fq_pie and fq_codel schedulers don't work.
Updated by Mateusz Guzik almost 3 years ago
I diagnosed the problem and wrote a patch for it, but don't heavy easy means to test:
diff --git a/sys/netpfil/ipfw/dn_sched_fq_pie.c b/sys/netpfil/ipfw/dn_sched_fq_pie.c index c3de665687a3..ae14152538bb 100644 --- a/sys/netpfil/ipfw/dn_sched_fq_pie.c +++ b/sys/netpfil/ipfw/dn_sched_fq_pie.c @@ -111,6 +111,9 @@ struct fq_pie_flow { int active; /* 1: flow is active (in a list) */ struct pie_status pst; /* pie status variables */ struct fq_pie_si_extra *psi_extra; +#ifdef VIMAGE + struct vnet *vnet; +#endif STAILQ_ENTRY(fq_pie_flow) flowchain; }; @@ -575,6 +578,7 @@ fqpie_callout_cleanup(void *x) mtx_destroy(&pst->lock_mtx); psi_extra = q->psi_extra; + CURVNET_SET(q->vnet); DN_BH_WLOCK(); psi_extra->nr_active_q--; @@ -585,6 +589,7 @@ fqpie_callout_cleanup(void *x) fq_pie_desc.ref_count--; } DN_BH_WUNLOCK(); + CURVNET_RESTORE(); } /* @@ -1052,6 +1057,9 @@ fq_pie_new_sched(struct dn_sch_inst *_si) for (i = 0; i < schk->cfg.flows_cnt; i++) { flows[i].pst.parms = &schk->cfg.pcfg; flows[i].psi_extra = si->si_extra; +#ifdef VIMAGE + flows[i].vnet = curvnet; +#endif pie_init(&flows[i], schk); }
Will you be able to plop a test kernel into /boot/kernel if I provide you a .tgz with the directory? (and of course test afterwards)
Updated by Anonymous almost 3 years ago
Never done it before but I should be able to. I have a test setup I can run it on.
Updated by Mateusz Guzik almost 3 years ago
Apologies for late reply, I somehow did not get notification of your response.
I pushed the patch to pfSense, should be available in the next snapshot so any patching by hand can be avoided. I'll let you know in a day or two when it is available.
Updated by Mateusz Guzik almost 3 years ago
This snapshot contains the fix: https://firmware-nyi.netgate.com/beta/snapshots/installer/pfSense-plus-22.01-BETA-amd64-20220111-0600.iso.gz
Updated by Anonymous almost 3 years ago
Ok I don't have Netgate hardware so I won't be able to test any Pfsense plus versions.
Updated by Anonymous almost 3 years ago
I tested the latest community edition 2.6.0.b.20220111.0600 on two different machines and everything is working good.
No problems so far anyway. :)
Updated by Mateusz Guzik almost 3 years ago
- Status changed from Feedback to Resolved
- Assignee set to Mateusz Guzik
Thanks for testing. I'll assume the issue is resolved, please reopen if the crash pops up again.
Updated by Jim Pingle almost 3 years ago
- Subject changed from When using the limiter scheduler fq_pie Pfsense page faults. to Kernel panic when using ``fq_pie`` limiter scheduler
- Target version set to 2.6.0
- Plus Target Version set to 22.01
Updated by Anonymous almost 3 years ago
I guess i should have checked a little better.
The limiter scheduler fq_pie is indeed fixed but you also need to patch the limiter aqm(Active queue management) pie as well. It is still causing a kernel panic.
Updated by Mateusz Guzik almost 3 years ago
Can you attach a dump? Both already attached only show the fq pie crash.
Updated by Anonymous almost 3 years ago
- File info_pie_rules.1 info_pie_rules.1 added
- File textdump_pie_rules.tar.1 textdump_pie_rules.tar.1 added
- File info_pie_reboot.2 info_pie_reboot.2 added
- File textdump_pie_reboot.tar.2 textdump_pie_reboot.tar.2 added
Ok I'm uploading two text dumps.
The first one occured right when I applied the floating firewall rules to pipe the traffic to the limiters.
The second occured during a reboot.
Updated by Mateusz Guzik almost 3 years ago
Ok, that's the same kind of problem, but it will have to be fixed differently. I'll try to do it today.
Updated by Mateusz Guzik almost 3 years ago
I pushed the fix, should be available in the next snapshot.
Updated by Anonymous almost 3 years ago
I can confirm this is working in the latest 2.70 snapshot.
Updated by Scott Long almost 3 years ago
- Status changed from New to In Progress
Updated by Scott Long almost 3 years ago
- Status changed from In Progress to Feedback
Updated by Anonymous almost 3 years ago
I am currently running the latest 2.60 release candidate snapshot and everything is working good.
Updated by Jim Pingle almost 3 years ago
- Status changed from Feedback to Resolved
Thanks for following up, let us know if you experience any additional instability.