Feature #4821: PPPoE WANs do not take full advantage of NIC driver queues for receiving traffic - pfSense - pfSense bugtracker

Actions

Copy link

Feature #4821

closed

PPPoE WANs do not take full advantage of NIC driver queues for receiving traffic

Added by Jim Pingle over 10 years ago. Updated almost 5 years ago.

Status:

Closed

Priority:

Low

Assignee:

Jim Thompson

Category:

Interfaces

Target version:

Start date:

07/08/2015

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Description

On PPPoE WANs packets are only received on one NIC driver queue (queue0) while packets are transmitted from all queues (queue0 and queue1). This has been observed on multiple systems with PPPoE-based WANs and igb(4) NICs, though it may also affect L2TP and PPTP type WANs since they all use mpd.

On my WAN (igb1 with PPPoE on top):

dev.igb.1.queue0.tx_packets: 2535085
dev.igb.1.queue0.rx_packets: 5365670
dev.igb.1.queue1.tx_packets: 2711996
dev.igb.1.queue1.rx_packets: 0

Other non-PPPoE interfaces on the same unit, including additional WANs, show activity in all driver queues.

On another, different, firewall with igb(4) NICs and PPPoE, the same condition is present:

dev.igb.0.queue0.tx_packets: 8504662
dev.igb.0.queue0.rx_packets: 29342831
dev.igb.0.queue1.tx_packets: 3543617
dev.igb.0.queue1.rx_packets: 0

On lower speed WANs there are no negative effects, but on higher-speed PPPoE WANs such as gigabit links it can cause some disparity where traffic is transmitted at the expected rate but not received at the expected rate.

How to check (on systems with igb NICs):

sysctl -a | grep '\.igb\..*x_pack'

Look for the PPPoE WAN physical interface and check if has activity in the tx queue1 and rx queue1, or only the tx queue1 (or higher queues)

Currently only tested on pfSense 2.2.3-RELEASE amd64 images, needs more feedback/testing to confirm if it happens on additional versions/architectures.

Actions

Copy link

Updated by Steve Wheeler over 10 years ago

Seems likely to be this:
"Unfortunately, RSS is usually capable of hashing IPv4 and IPv4 traffic (L3+L4). All other traffic like PPPoE or MPLS or .. is usually received by queue 0."
https://wiki.freebsd.org/NetworkPerformanceTuning

There is a patch suggested there.

Actions

Copy link

Updated by Jim Thompson about 10 years ago

Tracker changed from Bug to Feature

Actions

Copy link

Updated by Jim Thompson about 10 years ago

Assignee set to Jim Thompson
Priority changed from Normal to Low
Target version changed from 2.3 to Future

Fixing this likely requires an in-kernel RSS (Toeplitz) implementation. Such a thing is coming for FreeBSD (Adrian is working on it for the upper layers of the stack), but it's going to be a while before it's ready to interface to netisr.

Priority dropped to "low". Will review when we're based on 11.

Actions

Copy link

Updated by Julien REVERT over 9 years ago

Any news on this task? I'm deploying fiber internet on many pfsense APU2C4 and bandwith from wan is limited to 350 Mbits as soon as PPPoE is used (1 or the 4 cpu is 100% by PPPoE process). Otherwise (DHCP), fiber is at the top 950Mbits.

Actions

Copy link

Updated by Travis Erdmann about 9 years ago

Now that FreeBSD 11 is out and PPPoE Gig internet is becoming more available, can we take another look at this?

Actions

Copy link

Updated by Sebastian Foss about 9 years ago

Travis Erdmann wrote:

Now that FreeBSD 11 is out and PPPoE Gig internet is becoming more available, can we take another look at this?

Also happens on latests 2.4 Dev builds, but FreeBSD 11 includes the correct RSS awareness in the igb driver now.

Actions

Copy link

Updated by Jim Pingle almost 9 years ago

Target version changed from Future to 2.4.0

It still happens on 2.4, actually it's a little worse since it doesn't appear to transmit on the additional queues like it did previously:

dev.igb.1.queue1.rx_packets: 0
dev.igb.1.queue1.tx_packets: 0
dev.igb.1.queue0.rx_packets: 1978785
dev.igb.1.queue0.tx_packets: 1959503

Earlier note on the ticket said to review once we're on FreeBSD 11, so I'll set the target to 2.4 but it may need to be pushed again depending on what we find.

Actions

Copy link

Updated by Chris Allen almost 9 years ago

I would like to add that I am also experiencing this issue. I would love to see this fixed in pfSense 2.4 if possible. Jim Thompson do you have any idea what might be involved to fix this now that the FreeBSD 11 driver for igb has RSS awareness?

Actions

Copy link

Updated by Vladimir Suhhanov almost 9 years ago

According to this
https://lists.freebsd.org/pipermail/freebsd-net/2013-May/035564.html
Script that can solve CPU 1-core overload problem, but not default queue problem.


#!/bin/sh

# PROVIDE: cpuset-igb
# REQUIRE: FILESYSTEMS
# BEFORE:  netif
# KEYWORD: nojail

case "$1" in
*start)
  echo "Binding igb(4) IRQs to CPUs" 
  cpus=`sysctl -n kern.smp.cpus`
  vmstat -ai | sed -E '/^irq.*que/!d; s/^irq([0-9]+): igb([0-9]+):que ([0-9]+).*/\1 \2 \3/' |\
  while read irq igb que
  do
    cpuset -l $(( ($igb+$que) % $cpus )) -x $irq
  done
  ;;
esac

Actions

Copy link

#10

Updated by Chris Allen almost 9 years ago

Hi Jim, just wondering if this is still something that might make it into pfSense 2.4.0? I would love to use the maximum speed of my Gigabit PPPoE Fibre connection on my APU2C4 :)

Actions

Copy link

#11

Updated by Jim Thompson almost 9 years ago

Unlikely

Actions

Copy link

#12

Updated by J P over 8 years ago

This should definitely be marked as a bug and not a feature.
Has anybody tried the igb driver patch from https://wiki.freebsd.org/NetworkPerformanceTuning ?
The link is dead and I can't find any archived copies. That, in combination with the script above, should at least fix this temporarily for users of NICs that use the igb driver.

I found a forum post (https://forum.pfsense.org/index.php?topic=114123.0) that shows someone using the igb driver that is reporting all queues are used with PPPoE. Can someone test if this actually still happens?

Here is the FreeBSD bug tracker pointing out this issue https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
The issue is over a year old (October 2015) and hasn't seen much progress.

Actions

Copy link

#13

Updated by David Wood over 8 years ago

The problem with the patch mentioned in comment 12 was that it was a kludge for igb(4) only, not a fix for the underlying issue which, so far as I remember, affects PPPoE on all multi-queue NICs. I'm not even sure the kludge will apply cleanly on FreeBSD 11.x, let alone work, as there are new in-kernel RSS features and various fixes/updates to the igb(4) driver. I suspect the patch has disappeared from a combination of failing to be a universal solution, lack of support and code rot.

I'm fairly certain that the pfSense team will take the approach of fixing this issue properly for all scenarios or doing nothing. An unsupported patch for a single NIC driver is unlikely to qualify for inclusion in a production firewall distribution.

Annoying as this issue undoubtedly is, I expect it will miss the cut for pfSense 2.4-RELEASE. There comes a point where developers have to decide an upcoming release is feature complete in order to close out the remaining bugs and ship.

Actions

Copy link

#14

Updated by Chris Allen over 8 years ago

Could we please have this changed from "Feature" to "Bug"?

Actions

Copy link

#15

Updated by Jim Pingle over 8 years ago

It isn't a bug, it's a missing feature.

Actions

Copy link

#16

Updated by Scott Baugher over 8 years ago

I'm using the nightly builds (2.4.0.b.20170522.1522 as of right now). I also use gigabit fiber over PPPoE, so I'm happy to test and report back once a fix is pushed.

Actions

Copy link

#17

Updated by Julien REVERT over 8 years ago

Scott Baugher wrote:

I'm using the nightly builds (2.4.0.b.20170522.1522 as of right now). I also use gigabit fiber over PPPoE, so I'm happy to test and report back once a fix is pushed.

Is it fix using nightly builds?

Actions

Copy link

#18

Updated by Scott Baugher over 8 years ago

As of the June 2, 2017 build, it does not look like it. Receiving over PPPoE is still limited to one queue.

Actions

Copy link

#19

Updated by Jim Thompson over 8 years ago

Target version changed from 2.4.0 to Future

Actions

Copy link

#20

Updated by Yorick Gersie over 7 years ago

Took me a while to land on this issue. I'm facing similar issues not being able to utilize my full PPPoE WAN speed. Upload 500mbit/s+ Download "capped" at 350mbit/s. Any plans for addressing this issue?

Actions

Copy link

#21

Updated by Scott Baugher over 7 years ago

Not as of a couple of months ago. I contacted pfSense tech support (since I was using their hardware) and was basically told that they had no additional information on when it might be fixed other than what is posted here. I asked them to bring this issue up with the pfSense developers to attempt to raise its priority and get it resolved, and was basically told no. I've since experienced a failure of my pfSense hardware. That, coupled with no movement on this bug, caused me to switch away from pfSense entirely.

Actions

Copy link

#22

Updated by Benoit Lelievre over 7 years ago

Now that the Spectre and Meltdown patches are coming out on various OSes this becomes even more critical to fix because it's no longer possible to run a gigabit PPPoE connection off of a single processor core, no matter what speed that processor runs at.

Actions

Copy link

#23

Updated by Jim Pingle over 7 years ago

Since this is a missing feature in FreeBSD networking, you should lobby there for it to be addressed, not here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856

Actions

Copy link

#24

Updated by Benoit Lelievre over 7 years ago

I will but I was hoping that pfSense people would also push FreeBSD on it, since I'm sure they have a much stronger and direct influence than some Internet rando.

Actions

Copy link

#25

Updated by Matthew Staver over 7 years ago

There has been a flurry of activity on this freebsd bug post. It sounds like the issue is a hardware limitation in the intel NIC and the proposed work around is changing net.isr.dispatch to "deferred".

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11

It looks like at some point that parameter was set in a prior release: https://redmine.pfsense.org/issues/4754

As of 2.4.2 it is set to "direct".

Could the pfSense team weigh in on this in light of the comments from the FreeBSD guys?

Actions

Copy link

#26

Updated by Jim Pingle over 7 years ago

We have set it to deferred in the past on i386 to avoid a crash it otherwise encountered, but we do not explicitly set that tunable to any value these days.

Given the history there I would be cautious about changing it yourself to test. Certainly not something I would do by default for everyone without a lot of testing. If I recall correctly, changing that might lead to network performance issues of other kinds or potentially problems with ALTQ/Limiters, IPsec traffic handling, and so on.

I'm not sure any solid conclusion has been reached on that FreeBSD bug either, there is still some question about how it behaves on other operating systems and what the differences are there.

If you'd like to configure that, you can set it as a system tunable under System > Advanced, System Tunables tab. If you choose to do that, however, I would remove or disable any active ALTQ and limiter configurations.

Actions

Copy link

#27

Updated by Matthew Staver over 7 years ago

Jim Pingle wrote:

If you'd like to configure that, you can set it as a system tunable under System > Advanced, System Tunables tab. If you choose to do that, however, I would remove or disable any active ALTQ and limiter configurations.

I'm not using any traffic shaping at the moment so I went ahead and switched to deferred. With the setting direct, I get about 680Mb down and 920Mb up. Switching to deferred gained about 100Mbit of downstream (770Mbit) speed with no change in upstream.

If I understand the discussion over at FreeBSD, it still isn't using the other queues and that is reflected in my testing:

dev.igb.0.queue7.rx_packets: 0
dev.igb.0.queue6.rx_packets: 0
dev.igb.0.queue5.rx_packets: 0
dev.igb.0.queue4.rx_packets: 0
dev.igb.0.queue3.rx_packets: 0
dev.igb.0.queue2.rx_packets: 0
dev.igb.0.queue1.rx_packets: 0
dev.igb.0.queue0.rx_packets: 672066401

I'll leave the setting in place and see if any other problems present themselves. The FreeBSD guys seem pretty resistant to addressing this. Is it possible to be addressed by the pfSense team and if so how likely would that be?

Thanks,

Matt

Actions

Copy link

#28

Updated by Jim Pingle over 7 years ago

It is highly unlikely we'd be able to dedicate any resources toward adding this feature internally.

Actions

Copy link

#29

Updated by Alexandre Paradis about 7 years ago

ix driver seems to be affected also :

dev.ix.0.queue1.rx_packets: 67
dev.ix.0.queue1.tx_packets: 4107155
dev.ix.0.queue0.rx_packets: 6872678
dev.ix.0.queue0.tx_packets: 3345036

Actions

Copy link

#30

Updated by Sebastian Foss about 7 years ago

As i understood from Intel specification on various chipsets, non-ip traffic like pppoe can't be hashed for RSS to work thus always going to queue 0.

There are Intel 700 Series NICs that can RSS PPPoE traffic using DDP configuration file loaded from DPDK. The PPPOE pkgo profile files have been added to the Intel website <-- In case any of you need to handle 40Gbit/s PPPOE traffic...

Actions

Copy link

#31

Updated by Alexandre Paradis about 7 years ago

You were probably kidding, but my ISP will propose exactly this (40 gig) in 2-3 years ... at least that's the plan.

Is DDP working in pfsense with an intel or chelsio capable (i think T4 and up?) card ? if yes, this would solve the current issue we have.

Actions

Copy link

#32

Updated by L H about 7 years ago

Adding the tunable: net.isr.dispatch=deferred fixed it for me to reach the full rated speed of my link.

Read the FreeBSD bug comment for background:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856#c11

Haven't tested any side-effects of this yet, but so far it seems benign.

Actions

Copy link

#33

Updated by Valentin N about 7 years ago

Testing net.isr.dispatch on the NetGate SG-4860 on a 1 Gbps PPPoE connection (each result is averaged across 10 runs):

Test         Direct#1    Direct#2  Deferred#1  Deferred#2
Avg (Mbps)     530.98      494.19      620.68      708.50
Max (Mbps)     582.13      565.41      759.45      790.44

In my case the improvement was not quite line rate, but it is noticeable.

Actions

Copy link

#34

Updated by Jim Pingle about 7 years ago

Status changed from New to Closed

Added info to the docs about using the sysctl tunable to work around this. There doesn't appear to be anything more we can do.

https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html#pppoe-with-multi-queue-nics

Actions

Copy link

#35

Updated by Jim Pingle about 6 years ago

Target version deleted (~~Future~~)

Actions

Copy link

#36

Updated by Chris Collins about 6 years ago

Interestingly I appear to have rss working on pppoe using igb driver.

the tx is very misbalanced about 10:1 but rx is almost 50:50.

Actions

Copy link

#37

Updated by → luckman212 almost 5 years ago

Can we get some kind of CAPTCHA on here to rid ourselves of this polluting junk??

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Feature #4821

PPPoE WANs do not take full advantage of NIC driver queues for receiving traffic

Updated by Steve Wheeler over 10 years ago

Updated by Jim Thompson about 10 years ago

Updated by Jim Thompson about 10 years ago

Updated by Julien REVERT over 9 years ago

Updated by Travis Erdmann about 9 years ago

Updated by Sebastian Foss about 9 years ago

Updated by Jim Pingle almost 9 years ago

Updated by Chris Allen almost 9 years ago

Updated by Vladimir Suhhanov almost 9 years ago

Updated by Chris Allen almost 9 years ago

Updated by Jim Thompson almost 9 years ago

Updated by J P over 8 years ago

Updated by David Wood over 8 years ago

Updated by Chris Allen over 8 years ago

Updated by Jim Pingle over 8 years ago

Updated by Scott Baugher over 8 years ago

Updated by Julien REVERT over 8 years ago

Updated by Scott Baugher over 8 years ago

Updated by Jim Thompson over 8 years ago

Updated by Yorick Gersie over 7 years ago

Updated by Scott Baugher over 7 years ago

Updated by Benoit Lelievre over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Benoit Lelievre over 7 years ago

Updated by Matthew Staver over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Matthew Staver over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Alexandre Paradis about 7 years ago

Updated by Sebastian Foss about 7 years ago

Updated by Alexandre Paradis about 7 years ago

Updated by L H about 7 years ago

Updated by Valentin N about 7 years ago

Updated by Jim Pingle about 7 years ago

Updated by Jim Pingle about 6 years ago

Updated by Chris Collins about 6 years ago

Updated by → luckman212 almost 5 years ago