Project

General

Profile

Feature #4821

PPPoE WANs do not take full advantage of NIC driver queues for receiving traffic

Added by Jim Pingle about 2 years ago. Updated 27 days ago.

Status:
New
Priority:
Low
Assignee:
Category:
Interfaces
Target version:
Start date:
07/08/2015
Due date:
% Done:

0%


Description

On PPPoE WANs packets are only received on one NIC driver queue (queue0) while packets are transmitted from all queues (queue0 and queue1). This has been observed on multiple systems with PPPoE-based WANs and igb(4) NICs, though it may also affect L2TP and PPTP type WANs since they all use mpd.

On my WAN (igb1 with PPPoE on top):

dev.igb.1.queue0.tx_packets: 2535085
dev.igb.1.queue0.rx_packets: 5365670
dev.igb.1.queue1.tx_packets: 2711996
dev.igb.1.queue1.rx_packets: 0

Other non-PPPoE interfaces on the same unit, including additional WANs, show activity in all driver queues.

On another, different, firewall with igb(4) NICs and PPPoE, the same condition is present:

dev.igb.0.queue0.tx_packets: 8504662
dev.igb.0.queue0.rx_packets: 29342831
dev.igb.0.queue1.tx_packets: 3543617
dev.igb.0.queue1.rx_packets: 0

On lower speed WANs there are no negative effects, but on higher-speed PPPoE WANs such as gigabit links it can cause some disparity where traffic is transmitted at the expected rate but not received at the expected rate.

How to check (on systems with igb NICs):

sysctl -a | grep '\.igb\..*x_pack'

Look for the PPPoE WAN physical interface and check if has activity in the tx queue1 and rx queue1, or only the tx queue1 (or higher queues)

Currently only tested on pfSense 2.2.3-RELEASE amd64 images, needs more feedback/testing to confirm if it happens on additional versions/architectures.

History

#1 Updated by Steve Wheeler about 2 years ago

Seems likely to be this:
"Unfortunately, RSS is usually capable of hashing IPv4 and IPv4 traffic (L3+L4). All other traffic like PPPoE or MPLS or .. is usually received by queue 0."
https://wiki.freebsd.org/NetworkPerformanceTuning

There is a patch suggested there.

#2 Updated by Jim Thompson almost 2 years ago

  • Tracker changed from Bug to Feature

#3 Updated by Jim Thompson over 1 year ago

  • Assignee set to Jim Thompson
  • Priority changed from Normal to Low
  • Target version changed from 2.3 to Future

Fixing this likely requires an in-kernel RSS (Toeplitz) implementation. Such a thing is coming for FreeBSD (Adrian is working on it for the upper layers of the stack), but it's going to be a while before it's ready to interface to netisr.

Priority dropped to "low". Will review when we're based on 11.

#4 Updated by Julien REVERT about 1 year ago

Any news on this task? I'm deploying fiber internet on many pfsense APU2C4 and bandwith from wan is limited to 350 Mbits as soon as PPPoE is used (1 or the 4 cpu is 100% by PPPoE process). Otherwise (DHCP), fiber is at the top 950Mbits.

#5 Updated by Travis Erdmann 10 months ago

Now that FreeBSD 11 is out and PPPoE Gig internet is becoming more available, can we take another look at this?

#6 Updated by Sebastian Foss 10 months ago

Travis Erdmann wrote:

Now that FreeBSD 11 is out and PPPoE Gig internet is becoming more available, can we take another look at this?

Also happens on latests 2.4 Dev builds, but FreeBSD 11 includes the correct RSS awareness in the igb driver now.

#7 Updated by Jim Pingle 7 months ago

  • Target version changed from Future to 2.4.0

It still happens on 2.4, actually it's a little worse since it doesn't appear to transmit on the additional queues like it did previously:

dev.igb.1.queue1.rx_packets: 0
dev.igb.1.queue1.tx_packets: 0
dev.igb.1.queue0.rx_packets: 1978785
dev.igb.1.queue0.tx_packets: 1959503

Earlier note on the ticket said to review once we're on FreeBSD 11, so I'll set the target to 2.4 but it may need to be pushed again depending on what we find.

#8 Updated by Chris Allen 7 months ago

I would like to add that I am also experiencing this issue. I would love to see this fixed in pfSense 2.4 if possible. Jim Thompson do you have any idea what might be involved to fix this now that the FreeBSD 11 driver for igb has RSS awareness?

#9 Updated by Vladimir Putin 7 months ago

According to this
https://lists.freebsd.org/pipermail/freebsd-net/2013-May/035564.html
Script that can solve CPU 1-core overload problem, but not default queue problem.


#!/bin/sh

# PROVIDE: cpuset-igb
# REQUIRE: FILESYSTEMS
# BEFORE:  netif
# KEYWORD: nojail

case "$1" in
*start)
  echo "Binding igb(4) IRQs to CPUs" 
  cpus=`sysctl -n kern.smp.cpus`
  vmstat -ai | sed -E '/^irq.*que/!d; s/^irq([0-9]+): igb([0-9]+):que ([0-9]+).*/\1 \2 \3/' |\
  while read irq igb que
  do
    cpuset -l $(( ($igb+$que) % $cpus )) -x $irq
  done
  ;;
esac

#10 Updated by Chris Allen 6 months ago

Hi Jim, just wondering if this is still something that might make it into pfSense 2.4.0? I would love to use the maximum speed of my Gigabit PPPoE Fibre connection on my APU2C4 :)

#11 Updated by Jim Thompson 5 months ago

Unlikely

#12 Updated by J P 5 months ago

This should definitely be marked as a bug and not a feature.
Has anybody tried the igb driver patch from https://wiki.freebsd.org/NetworkPerformanceTuning ?
The link is dead and I can't find any archived copies. That, in combination with the script above, should at least fix this temporarily for users of NICs that use the igb driver.

I found a forum post (https://forum.pfsense.org/index.php?topic=114123.0) that shows someone using the igb driver that is reporting all queues are used with PPPoE. Can someone test if this actually still happens?

Here is the FreeBSD bug tracker pointing out this issue https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
The issue is over a year old (October 2015) and hasn't seen much progress.

#13 Updated by David Wood 5 months ago

The problem with the patch mentioned in comment 12 was that it was a kludge for igb(4) only, not a fix for the underlying issue which, so far as I remember, affects PPPoE on all multi-queue NICs. I'm not even sure the kludge will apply cleanly on FreeBSD 11.x, let alone work, as there are new in-kernel RSS features and various fixes/updates to the igb(4) driver. I suspect the patch has disappeared from a combination of failing to be a universal solution, lack of support and code rot.

I'm fairly certain that the pfSense team will take the approach of fixing this issue properly for all scenarios or doing nothing. An unsupported patch for a single NIC driver is unlikely to qualify for inclusion in a production firewall distribution.

Annoying as this issue undoubtedly is, I expect it will miss the cut for pfSense 2.4-RELEASE. There comes a point where developers have to decide an upcoming release is feature complete in order to close out the remaining bugs and ship.

#14 Updated by Chris Allen 3 months ago

Could we please have this changed from "Feature" to "Bug"?

#15 Updated by Jim Pingle 3 months ago

It isn't a bug, it's a missing feature.

#16 Updated by Scott Baugher 2 months ago

I'm using the nightly builds (2.4.0.b.20170522.1522 as of right now). I also use gigabit fiber over PPPoE, so I'm happy to test and report back once a fix is pushed.

#17 Updated by Julien REVERT 29 days ago

Scott Baugher wrote:

I'm using the nightly builds (2.4.0.b.20170522.1522 as of right now). I also use gigabit fiber over PPPoE, so I'm happy to test and report back once a fix is pushed.

Is it fix using nightly builds?

#18 Updated by Scott Baugher 27 days ago

As of the June 2, 2017 build, it does not look like it. Receiving over PPPoE is still limited to one queue.

#19 Updated by Jim Thompson 27 days ago

  • Target version changed from 2.4.0 to Future

Also available in: Atom PDF