Project

General

Profile

Actions

Bug #4856

closed

Traffic Shaper blocks traffic when the config is otherwise changed

Added by Michael Knowles over 8 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Traffic Shaper (ALTQ)
Target version:
-
Start date:
07/20/2015
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
All
Affected Architecture:
All

Description

When changing a firewall or NAT rule, or converting a NAT rule to loadbalancer (or potentially other firewall-related issue), I have seen the traffic shaper block the relevant traffic.

The workaround is to delete the traffic shaper from the interfaces and re-create it with the same settings. Only once this has occurred will the relevant traffic be allowed to pass (with absolutely everything else config-wise remaining identical).

I have about 20 pfSense instances in customer sites and have seen this on many instances and versions, all v2.x, and last experienced this yesterday with a v2.2.3 when changing a NAT rule for SMTP into a rule for the loadbalancer.

One thing all the instances have in common is that they are all virtualised on VMware (various versions from 4.1 to 5.5U2)


Files

rules.debug (21.2 KB) rules.debug Wayne Huang, 10/19/2015 10:51 AM
Actions #1

Updated by Chris Buechler over 8 years ago

  • Status changed from New to Feedback
  • Target version deleted (2.2.4)

Going to need more to go on here, the case as described isn't replicable. Like a specific set of steps, start with this config file, make X change(s), and you'll see the issue.

Actions #2

Updated by Michael Knowles over 8 years ago

  • File config-officepfsense.pentangle-connect.com-20150720170007.xml added

I can give a config if required (see attached file), but the problem is that the issue appears often enough for me to know how to work around it, but not often enough for me to conclusively say "you do XYZ and it fails".

What I was doing yesterday though I do remember was converting a NAT port redirect for an internal SMTP server to a load balancer config to pass email traffic to two internal SMTP servers.

This consisted of starting with a working instance which passed port 25 through to a single internal server, creating the loadbalancer config for two servers, deleting the NAT config for the single server, ensuring the firewall config was sorted for the 2 servers on port 25 inbound, then checking this (which failed), resetting the firewall states (still failed), rebooting the pfsense instance (failed) and then removing the traffic shaper from the interfaces (which instantly worked), and then going through the traffic shaper wizard to return the traffic shaper to operation.

As far as the operation was going, I know I was doing the right thing with no extraneous clicks as i'd just finished doing the same task at our hosting site which doesn't use traffic shaping.

Anyway, the VMware host this instance was running on is ESXi v5.5.0 build 1623387 (which equates to 5.5.0U1), on a Dell T410 server, and the NICs in use are a pair of Broadcom BCM5716 (built into the server) and an Intel add-in 82576 PCIe dual NIC card.

As I say though, the config can be considerably more basic than this and still give issues. Simply a single LAN/single WAN config with a SBS server VM on the LAN with ports NATted to the internet can exhibit this issue when changing firewall configs. The one common feature aside from all my instances having been virtualised on VMware is removing the traffic shaper and re-adding it fixes the issue.

Actions #3

Updated by Chris Buechler over 8 years ago

  • File deleted (config-officepfsense.pentangle-connect.com-20150720170007.xml)
Actions #4

Updated by Chris Buechler over 8 years ago

  • Assignee set to Chris Buechler

thanks for the config. I deleted it from here since there are potentially sensitive things in it and added it to a private internal repo. projects/ticket-configs/redmine-4856.xml

to me for review

Actions #5

Updated by Wayne Huang over 8 years ago

I've seen similar symptoms using 2.2.4 on AMD64 and ADI/Embedded architectures. I can reproduce this by configuring WAN to use DHCP, then setting up traffic shaping using the Multi LAN to WAN wizard. Everything works fine once the shaper is applied. However, when the pfSense box is rebooted, connectivity is lost until the traffic shaper is removed. This is reliably reproducible in my lab using this setup and following these steps. Please let me know if there are any logs or config files that would be useful to help troubleshoot or determine if the issue I see is even related to this.

Actions #6

Updated by Wayne Huang over 8 years ago

In my case, it appears the traffic shaper config as written by pfSense wizard has a problem:

Diagnostics > Command prompt > pfctl -f /tmp/rules.debug
$ pfctl -f /tmp/rules.debug
bandwidth for qInternet higher than interface
/tmp/rules.debug:63: errors in queue definition
parent qInternet not found for qACK
/tmp/rules.debug:64: errors in queue definition
parent qInternet not found for qP2P
/tmp/rules.debug:65: errors in queue definition
parent qInternet not found for qVoIP
/tmp/rules.debug:66: errors in queue definition
parent qInternet not found for qOthersHigh
/tmp/rules.debug:67: errors in queue definition
parent qInternet not found for qOthersLow
/tmp/rules.debug:68: errors in queue definition
bandwidth for qInternet higher than interface
/tmp/rules.debug:73: errors in queue definition
parent qInternet not found for qACK
/tmp/rules.debug:74: errors in queue definition
parent qInternet not found for qP2P
/tmp/rules.debug:75: errors in queue definition
parent qInternet not found for qVoIP
/tmp/rules.debug:76: errors in queue definition
parent qInternet not found for qOthersHigh
/tmp/rules.debug:77: errors in queue definition
parent qInternet not found for qOthersLow
/tmp/rules.debug:78: errors in queue definition
pfctl: Syntax error in config file: pf rules not loaded

Actions #7

Updated by Jim Pingle over 8 years ago

Wayne Huang wrote:

In my case, it appears the traffic shaper config as written by pfSense wizard has a problem:
bandwidth for qInternet higher than interface

You'll see that when the link speed has dropped. For example if you setup the wizard when linked at 1Gbit/s and then the link speed on the NIC dropped to 100M, then it will complain.

Actions #8

Updated by Wayne Huang over 8 years ago

Yes, I know - the issue is that the link speed has not dropped. The interface is 1Gbps and has not changed.

igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1492
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:26:75:eb:a2:6f
inet6 fe80::226:75ff:feeb:a26f%igb1 prefixlen 64 scopeid 0x2
inet 10.0.3.224 netmask 0xfffffc00 broadcast 10.0.3.255
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active

Actions #9

Updated by Jim Pingle over 8 years ago

Then you'll need to attach a copy of config.xml (or at least the shaper section and shaper wizard section) along with a copy of /tmp/rules.debug, and the output of "ifconfig -a"

Actions #10

Updated by Wayne Huang over 8 years ago

I'll need to extract out the Traffic Shaper sections for config.xml, but here's the rest.

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:08:a2:09:49:5c
inet6 fe80::208:a2ff:fe09:495c%igb0 prefixlen 64 scopeid 0x1
inet 10.0.8.1 netmask 0xfffffc00 broadcast 10.0.11.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1492
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:26:75:eb:a2:6f
inet6 fe80::226:75ff:feeb:a26f%igb1 prefixlen 64 scopeid 0x2
inet 10.0.3.224 netmask 0xfffffc00 broadcast 10.0.3.255
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO>
ether 00:08:a2:09:49:58
inet6 fe80::208:a2ff:fe09:4958%igb2 prefixlen 64 scopeid 0x3
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:08:a2:09:49:59
inet6 fe80::208:a2ff:fe09:4959%igb3 prefixlen 64 scopeid 0x4
inet 192.168.16.1 netmask 0xffffff00 broadcast 192.168.16.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb4: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:08:a2:09:49:5a
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb5: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether 00:08:a2:09:49:5b
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
pflog0: flags=100<PROMISC> metric 0 mtu 33144
pfsync0: flags=0<> metric 0 mtu 1500
syncpeer: 224.0.0.240 maxupd: 128 defer: on
syncok: 1
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x9
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
igb2_vlan12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=3<RXCSUM,TXCSUM>
ether 00:08:a2:09:49:58
inet6 fe80::208:a2ff:fe09:4958%igb2_vlan12 prefixlen 64 scopeid 0xb
inet 192.168.20.1 netmask 0xfffffc00 broadcast 192.168.23.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
vlan: 12 vlanpcp: 0 parent interface: igb2
ovpns1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
inet6 fe80::208:a2ff:fe09:495c%ovpns1 prefixlen 64 scopeid 0xc
inet 172.16.12.1 --> 172.16.12.2 netmask 0xffffffff
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
Opened by PID 18913

Actions #11

Updated by Wayne Huang over 8 years ago

shaper and ezshaper sections of config.xml: https://gist.github.com/wayne530/beb4da84ecaa3b19bf2d

Actions #12

Updated by Wayne Huang over 8 years ago

Does it make sense that 100 Mb becomes 104857.6 Kb? That calculation seems incorrect - if I take the latter value and multiply by 1024, I get 107374182.4 b vs. 100 Mb == 104857600 b. Perhaps it should be 102400 Kb instead? Basically, there seems to be an inconsistent use of 1024 vs 1000 for "K".

Actions #13

Updated by Wayne Huang over 8 years ago

Is it because some of the configured interfaces do not have a link at the time of bootup?

Actions #14

Updated by Michael Knowles over 8 years ago

Just to add, as I've been watching this conversation go on today, I've never seen an interface speed change be related to this, and since all my instances are under VMware the hypervisor gives a standard 10gbit/s link speed irrespective of the underlying hardware link speed anyway.

Actions #15

Updated by Wayne Huang over 8 years ago

In my case, it is exactly due to some interfaces being down. It must receive an interface bandwidth of 0 when the interface isn't up. I plugged in all the ports and the same pfctl command now works correctly. In theory it makes sense, but in practice that would not be the behavior I'd expect (it perhaps should ignore the queues associated with a down interface). Network configuration can change - a downstream switch for a non-critical part of the network could die, but I would not expect the pfSense to come up in a broken state. By the way, due to the pfctl erroring out, all of the default "block all" rules on the WAN are not implemented, so in addition to NAT being broken, you also get a wide open WAN. I'd revisit the order in which shaper rules are located in the pf rules file and move them to the bottom if possible.

Actions #16

Updated by Wayne Huang over 8 years ago

@Michael, I'd be curious to see the output of running "pfctl -f /tmp/rules.debug" when you experience the problem to rule out any sort of rule error.

Actions #17

Updated by Wayne Huang over 8 years ago

Additionally, it seems your web UI does not properly surface these errors, assuming "Reload Filters" is doing something similar to the pfctl command. This UI feature appears to work fine and does not report any errors.

Actions #18

Updated by Michael Knowles over 8 years ago

@wayne - will certainly do that when I come up against the issue again, but like I said it's not that often it occurs. I guess I'm the one reporting it as almost all my pfsense instances use the traffic shaper as I invariably have a VoIP PBX behind them so need the QoS.

I think we've probably demonstrated enough of an issue there (especially in light of your wide open WAN finding) to warrant at least an investigation of what's going on. Even if we just end up tying the apinger to a shaper rewrite as a kludge! (non-serious bugfix suggestion)

Actions #19

Updated by Chris Buechler almost 8 years ago

  • Assignee deleted (Chris Buechler)
Actions #20

Updated by Jim Pingle over 4 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF