Project

General

Profile

Actions

Regression #11550

closed

Segmentation fault when loading ALTQ traffic shaping rules using FAIRQ

Added by Thorsten Zitterell about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
Traffic Shaper (ALTQ)
Target version:
Start date:
02/26/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
21.05
Release Notes:
Default
Affected Version:
2.5.0
Affected Architecture:
amd64

Description

I have upgraded from 2.4.5p1 to 21.02/21.02p1 on my SG-4860.

Following traffic shaper rule causes an segmentation fault:


[21.02-RELEASE][admin@firewall]/root: pfctl -vf /tmp/rules.debug
[...]
altq on igb0 fairq bandwidth 1Gb tbrsize 36000 queue { qLink qAck qOthersHigh qVoIP qOthersLow }
Segmentation fault (core dumped)

As a result other rules are not loaded and NAT does not work.


Files

shaper.xml (3.73 KB) shaper.xml Thorsten Zitterell, 02/26/2021 08:08 AM
shaper-config-pfsense-20210905043817.xml (894 Bytes) shaper-config-pfsense-20210905043817.xml Brett Keller, 09/05/2021 04:59 AM
Actions #1

Updated by Jim Pingle about 3 years ago

  • Tracker changed from Bug to Regression
  • Project changed from pfSense Plus to pfSense
  • Subject changed from Segfault when Traffic Shaper is active to Segmentation fault when loading ALTQ traffic shaping rules
  • Category changed from Traffic Shaper (ALTQ) to Traffic Shaper (ALTQ)
  • Status changed from New to Feedback
  • Target version set to CE-Next
  • Affected Version set to 2.5.0

Unlikely that this is specific to Plus.

Can you attach the config.xml entries for the shaper? It would help to see the queue settings and so on to reproduce the issue locally.
Or at the very least, post the specific settings you put into the shaper wizard if that's what you used to create the queues.

Actions #2

Updated by Thorsten Zitterell about 3 years ago

Jim Pingle wrote:

Can you attach the config.xml entries for the shaper? It would help to see the queue settings and so on to reproduce the issue locally.

<shaper> from config.xml attached.

Actions #3

Updated by Jim Pingle about 3 years ago

Not that it should cause a segfault, but why are you mixing FAIRQ, PRIQ, and HFSC?

Does the crash happen if all your interfaces are using the same scheduler?

Actions #4

Updated by Thorsten Zitterell about 3 years ago

Jim Pingle wrote:

Not that it should cause a segfault, but why are you mixing FAIRQ, PRIQ, and HFSC?

I used PRIQ for outgoing WAN interfaces, and FAIRQ for LAN interfaces because I wanted balanced rates to the internal hosts. The interface with HFSC was not enabled.

Does the crash happen if all your interfaces are using the same scheduler?

The crash does not happen if I use PRIQ for all interfaces. So it seems to be related to FAIRQ.

Actions #5

Updated by Jim Pingle about 3 years ago

Have you tried only using FAIRQ instead of only using PRIQ? It's not clear from the symptom behavior if the problem is from FAIRQ alone or from mixing the two schedulers.

Actions #6

Updated by Thorsten Zitterell about 3 years ago

Jim Pingle wrote:

Have you tried only using FAIRQ instead of only using PRIQ? It's not clear from the symptom behavior if the problem is from FAIRQ alone or from mixing the two schedulers.

When I use FAIRQ for all the interfaces, the segfault comes with the first rule.

The last lines of the trace are:


23153 pfctl CALL mmap(0,0x3000,0x3<PROT_READ|PROT_WRITE>,0x1002<MAP_PRIVATE|MAP_ANON>,0xffffffff,0)
23153 pfctl RET mmap 34367680512/0x800793000
23153 pfctl CALL write(0x1,0x800738000,0x6d)
23153 pfctl GIO fd 1 wrote 109 bytes
"altq on pppoe0 fairq bandwidth 1Mb tbrsize 1492 queue { qACK qLink qVoIP qOthersHigh qOthersMid qOthersLow }
"
23153 pfctl RET write 109/0x6d
23153 pfctl PSIG SIGSEGV SIG_DFL code=SEGV_MAPERR
23153 pfctl NAMI "/root/pfctl.core"

Actions #7

Updated by Jim Pingle about 3 years ago

  • Subject changed from Segmentation fault when loading ALTQ traffic shaping rules to Segmentation fault when loading ALTQ traffic shaping rules using FAIRQ
  • Status changed from Feedback to New

OK, thanks for checking on that. I've updated the subject to reflect that it's specific to FAIRQ.

Actions #8

Updated by Jim Pingle almost 3 years ago

  • Plus Target Version set to 21.05

Would be nice to fix soon if we can, but not a blocker at the moment.

Actions #9

Updated by Kristof Provost almost 3 years ago

This is an upstream FreeBSD bug, and is reproducible with the following pf.conf on a recent FreeBSD/main:

altq on mvneta0 fairq bandwidth 1Gb tbrsize 36000 queue { qLink qAck qOthersHigh qVoIP qOthersLow }
queue qLink fairq(default)

Actions #10

Updated by Jim Pingle almost 3 years ago

  • Status changed from New to Feedback
  • Assignee set to Kristof Provost
  • Target version changed from CE-Next to 2.6.0

Kristof committed a potential fix for this, needs tested. If it's still an issue, set target ahead to 21.09.

Actions #11

Updated by Jim Pingle almost 3 years ago

  • Target version changed from 2.6.0 to 2.5.2
Actions #12

Updated by Viktor Gurov almost 3 years ago

  • Status changed from Feedback to Resolved

pfSense 2.5.1 test:

# pfctl -vf /tmp/rules.debug
...
set loginterface vtnet0
set skip on { pfsync0 }
altq on vtnet0 fairq bandwidth 10Mb tbrsize 36000 queue { q1 qq2 }
Segmentation fault (core dumped)

pfSense 2.5.2.b.20210601.0300 test:

# pfctl -vf /tmp/rules.debug
...
set loginterface vtnet0
set skip on { pfsync0 }
altq on vtnet0 fairq bandwidth 10Mb tbrsize 36000 queue { q1 qq2 }
queue q1 on vtnet0 bandwidth 5Mb fairq( default ) 
queue qq2 on vtnet0 bandwidth 3Mb priority 2
...
(no segfault)

Actions #13

Updated by Roman Nik over 2 years ago

Its look like regression in 2.5.2 release, because for 2.5.2 beta all worked fine.

Actions #14

Updated by Jim Pingle over 2 years ago

Roman Nik wrote:

Its look like regression in 2.5.2 release, because for 2.5.2 beta all worked fine.

Are the symptoms identical?

Actions #15

Updated by Brett Keller over 2 years ago

I'm afraid I have to agree with Roman Nik that this bug is still around in 2.5.2-RELEASE.

I just upgraded from 2.4.5_1 to 2.5.2, and I got bitten by this same bug. After the upgrade, the firewall came back up essentially rule-less because pfctl segfaulted while parsing the ruleset. I first noticed while inspecting the system logs post-upgrade, and I freaked out when I saw internet bots banging on my pfSense SSH port, which is normally completely firewalled off from the WAN interface! After disabling the WAN for safety, I was able to debug the problem and found that the shaper rules were the cause of the segfault.

# cat /etc/version*
2.5.2-RELEASE
Fri Jul 02 15:33:00 EDT 2021
fd0f54f44b5ceb91c4579ed9536de58b8925836d
0

# pfctl -vf /tmp/rules.debug
[...snip...]
set loginterface igb1
set skip on { pfsync0 }
altq on igb0 fairq bandwidth 6.25Mb tbrsize 6000 queue { WAN_main }
Segmentation fault (core dumped)

Note that the symptoms are identical to those originally reported here: ALTQ using FAIRQ causes a segfault on rule parse.

Once I disabled both of my shapers in pfSense and reloaded the config, the firewall came up normally, and a pfctl -vf /tmp/rules.debug would return without error.

I've attached the shaper section of my pfSense config for reference.

Actions

Also available in: Atom PDF