Project

General

Profile

Actions

Regression #12622

closed

Kernel panic when using ``fq_pie`` limiter scheduler

Added by Harley Peters 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Traffic Shaper (Limiters)
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
22.01
Release Notes:
Default
Affected Version:
2.6.0
Affected Architecture:

Description

When ever i try and use the limiter scheduler fq_pie pfsense crashes with a page fault.
I can recover by disabling the wan port.
This affects the current 2.60 beta.
2.50 and 2.52 work fine.
The crash doesn't occur until you apply traffic to the limiter via a firewall rule.

This occurs on both a kvm virtual machine and apu2c4 hardware.

I have attached a couple of textdumps.
0. Was created by starting up pfsense and letting in boot until it crashes.
1. Was created by having the firewall rules that apply the traffic to the limiter disabled at boot time and then enabling them after boot up.


Files

info.0 (401 Bytes) info.0 Harley Peters, 12/20/2021 04:34 PM
info.1 (402 Bytes) info.1 Harley Peters, 12/20/2021 04:34 PM
textdump.tar.0 (72.5 KB) textdump.tar.0 Harley Peters, 12/20/2021 04:34 PM
textdump.tar.1 (103 KB) textdump.tar.1 Harley Peters, 12/20/2021 04:34 PM
info_pie_rules.1 (402 Bytes) info_pie_rules.1 Harley Peters, 01/14/2022 01:43 PM
textdump_pie_rules.tar.1 (105 KB) textdump_pie_rules.tar.1 Harley Peters, 01/14/2022 01:44 PM
info_pie_reboot.2 (402 Bytes) info_pie_reboot.2 Harley Peters, 01/14/2022 01:44 PM
textdump_pie_reboot.tar.2 (120 KB) textdump_pie_reboot.tar.2 Harley Peters, 01/14/2022 01:44 PM
Actions #1

Updated by Viktor Gurov 5 months ago

Unable to reproduce on 2.6.0.b.20211220.0600

/tmp/rules.limiter:

pipe 1 config  bw 10Mb pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn
sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 noecn

/tmp/rules.debug:

...
pass  in  quick  on $LAN inet proto tcp  from any to 2.2.2.2 ridentifier 1638455473 flags S/SA keep state  dnpipe ( 1)  label "USER_RULE" 

Actions #2

Updated by Viktor Gurov 5 months ago

  • Status changed from New to Feedback
  • Affected Version set to 2.6.0
Actions #3

Updated by Jim Pingle 5 months ago

  • Tracker changed from Bug to Regression
  • Target version deleted (2.6.0)
Actions #4

Updated by Harley Peters 5 months ago

This is the configuration I have it set to.
I also tried with noecn still page faulted.
The very same configuration works fine by just changing the scheduler to fq_codel.
Pfsense version 2.6.0.b.20211220.0600

rules.limiter:

pipe 1 config bw 4800Kb droptail
sched 1 config pipe 1 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 1 config pipe 1 droptail

pipe 2 config bw 48Mb droptail
sched 2 config pipe 2 type fq_pie target 15ms tupdate 15ms alpha 125 beta 1250 max_burst 150000 max_ecnth 99 ecn
queue 2 config pipe 2 droptail

rules.debug:

pass quick on { vtnet0 } inet proto icmp from any to any icmp-type { echorep,echoreq } ridentifier 1623355920 keep state label "USER_RULE: (LIMITERS}Ipv4IcmpBugWorkAround"
match out quick on { vtnet0 } inet from any to any ridentifier 1621820692 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv4WanOut"
match in quick on { vtnet0 } inet from any to any ridentifier 1622208712 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv4WanIn"
match out quick on { vtnet0 } inet6 from any to ! fe80::/64 ridentifier 1621820766 dnqueue( 1,2) label "USER_RULE: (LIMITERS)Ipv6WanOut"
match in quick on { vtnet0 } inet6 from ! fe80::/64 to any ridentifier 1622208794 dnqueue( 2,1) label "USER_RULE: (LIMITERS)Ipv6WanIn"

Actions #5

Updated by Harley Peters 5 months ago

Ok the reason Viktor Gurov setup didn't page fault is he didn't add child queue to the scheduler(Tested).
The problem is without a child queue the fq_pie and fq_codel schedulers don't work.

Actions #6

Updated by Mateusz Guzik 5 months ago

I diagnosed the problem and wrote a patch for it, but don't heavy easy means to test:

diff --git a/sys/netpfil/ipfw/dn_sched_fq_pie.c b/sys/netpfil/ipfw/dn_sched_fq_pie.c
index c3de665687a3..ae14152538bb 100644
--- a/sys/netpfil/ipfw/dn_sched_fq_pie.c
+++ b/sys/netpfil/ipfw/dn_sched_fq_pie.c
@@ -111,6 +111,9 @@ struct fq_pie_flow {
        int active;             /* 1: flow is active (in a list) */
        struct pie_status pst;  /* pie status variables */
        struct fq_pie_si_extra *psi_extra;
+#ifdef VIMAGE
+       struct vnet *vnet;
+#endif
        STAILQ_ENTRY(fq_pie_flow) flowchain;
 };

@@ -575,6 +578,7 @@ fqpie_callout_cleanup(void *x)
        mtx_destroy(&pst->lock_mtx);
        psi_extra = q->psi_extra;

+       CURVNET_SET(q->vnet);
        DN_BH_WLOCK();
        psi_extra->nr_active_q--;

@@ -585,6 +589,7 @@ fqpie_callout_cleanup(void *x)
                fq_pie_desc.ref_count--;
        }
        DN_BH_WUNLOCK();
+       CURVNET_RESTORE();
 }

 /* 
@@ -1052,6 +1057,9 @@ fq_pie_new_sched(struct dn_sch_inst *_si)
        for (i = 0; i < schk->cfg.flows_cnt; i++) {
                flows[i].pst.parms = &schk->cfg.pcfg;
                flows[i].psi_extra = si->si_extra;
+#ifdef VIMAGE
+               flows[i].vnet = curvnet;
+#endif
                pie_init(&flows[i], schk);
        }

Will you be able to plop a test kernel into /boot/kernel if I provide you a .tgz with the directory? (and of course test afterwards)

Actions #7

Updated by Harley Peters 5 months ago

Never done it before but I should be able to. I have a test setup I can run it on.

Actions #8

Updated by Mateusz Guzik 4 months ago

Apologies for late reply, I somehow did not get notification of your response.

I pushed the patch to pfSense, should be available in the next snapshot so any patching by hand can be avoided. I'll let you know in a day or two when it is available.

Actions #10

Updated by Harley Peters 4 months ago

Ok I don't have Netgate hardware so I won't be able to test any Pfsense plus versions.

Actions #11

Updated by Harley Peters 4 months ago

I tested the latest community edition 2.6.0.b.20220111.0600 on two different machines and everything is working good.
No problems so far anyway. :)

Actions #12

Updated by Mateusz Guzik 4 months ago

  • Status changed from Feedback to Resolved
  • Assignee set to Mateusz Guzik

Thanks for testing. I'll assume the issue is resolved, please reopen if the crash pops up again.

Actions #13

Updated by Jim Pingle 4 months ago

  • Subject changed from When using the limiter scheduler fq_pie Pfsense page faults. to Kernel panic when using ``fq_pie`` limiter scheduler
  • Target version set to 2.6.0
  • Plus Target Version set to 22.01
Actions #14

Updated by Harley Peters 4 months ago

I guess i should have checked a little better.
The limiter scheduler fq_pie is indeed fixed but you also need to patch the limiter aqm(Active queue management) pie as well. It is still causing a kernel panic.

Actions #15

Updated by Jim Pingle 4 months ago

  • Status changed from Resolved to New
Actions #16

Updated by Mateusz Guzik 4 months ago

Can you attach a dump? Both already attached only show the fq pie crash.

Actions #17

Updated by Harley Peters 4 months ago

Ok I'm uploading two text dumps.
The first one occured right when I applied the floating firewall rules to pipe the traffic to the limiters.
The second occured during a reboot.

Actions #18

Updated by Mateusz Guzik 4 months ago

Ok, that's the same kind of problem, but it will have to be fixed differently. I'll try to do it today.

Actions #19

Updated by Mateusz Guzik 4 months ago

I pushed the fix, should be available in the next snapshot.

Actions #20

Updated by Harley Peters 4 months ago

I can confirm this is working in the latest 2.70 snapshot.

Actions #21

Updated by Scott Long 4 months ago

  • Status changed from New to In Progress
Actions #22

Updated by Scott Long 4 months ago

  • Status changed from In Progress to Feedback
Actions #23

Updated by Harley Peters 4 months ago

I am currently running the latest 2.60 release candidate snapshot and everything is working good.

Actions #24

Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to Resolved

Thanks for following up, let us know if you experience any additional instability.

Actions

Also available in: Atom PDF