Project

General

Profile

Actions

Bug #6512

closed

Upgrade to 2.3.1 causes network performance degradation (with High CPU usage by NIC kernel tasks)

Added by Juan Gallego over 8 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Operating System
Target version:
-
Start date:
06/21/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
Affected Architecture:

Description

Hi,

Currently we have a 2-node pfsense system working in active/passive HA. This cluster was running pfsense v2.2.3, and recently we upgraded the slave node to v2.3.1-1, and forced a failover from master to slave in order to test things..

This system is running a mix of normal pf filtering, some IPSec tunnels, and also an HAProxy instance exposing a few (+-10) frontend+backends.

However, the upgraded node (when running as master), shows a clear network performance degradation: While node-1 (the one still running v2.2.3) can easily forward traffic at +250Mb/s, the alternate node (the one running v2.3) tops at +-80Mb/s.

While diagnosing the issue we’ve found node running pfSense v2.3 to have a high load under such a ‘low’ traffic (ie. 80Mb/s), and high CPU usage by network drivers, as show below:

[2.3.1-RELEASE][root@]/root: top -nCHSIzs1
last pid: 28317;  load averages:  4.07,  4.23,  4.37  up 2+11:40:04    16:22:50
311 processes: 9 running, 282 sleeping, 20 waiting

Mem: 31M Active, 502M Inact, 385M Wired, 883M Buf, 5020M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME     CPU COMMAND
    0 root       -92    -     0K   240K CPU1    1  21.8H  99.37% kernel{nfe0 taskq}
    0 root       -92    -     0K   240K CPU2    2  29.4H  73.29% kernel{em0 taskq}
    0 root       -92    -     0K   240K CPU0    0  18.6H  44.78% kernel{em1 taskq}
   12 root       -72    -     0K   336K WAIT    0  65:15  14.60% intr{swi1: netisr 0}
  438 nobody      22    0 30184K  4404K select  3 121:18   4.79% dnsmasq
28430 root        21    0 43756K 17440K kqread  3  51:40   1.46% haproxy
   12 root       -72    -     0K   336K WAIT    0  31:51   1.37% intr{swi1: pfsync}
90479 root        20    0 25720K  7176K select  2  23:43   0.59% openvpn
49607 root        20    0 14516K  2320K select  0  28:31   0.29% syslogd
30713 root        20    0 16676K  2736K bpf     0  18:55   0.10% filterlog
28317 root        21    0 21856K  2992K CPU2    2   0:00   0.10% top

Obviously, firewall rules, services configuration, IPSec tunnels, etc. are configured the same on both nodes. And we’ve compared system values (like tunables, /boot/loader.conf*, and runtime sysctl values across both nodes).. So it looks like an issue with regard to pfSense 2.3 kernel code changes/enhancements.

Both nodes are identical hardware, consisting on:

FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016
    root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Dual-Core AMD Opteron(tm) Processor 2216 (2393.69-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f12  Family=0xf  Model=0x41  Stepping=2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 6442450944 (6144 MB)
avail memory = 6194679808 (5907 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3

The network interface cards available at both nodes are as follows:

nfe0: NVIDIA nForce4 CK804 MCP9 Networking Adapter
nfe1: NVIDIA nForce4 CK804 MCP9 Networking Adapter
em0: Intel(R) PRO/1000 (82546EB)
em1: Intel(R) PRO/1000 (82546EB)
lagg0: LACP lagg with em0, em1 & nfe0 attached.

The pfSense version running at node-1 (the one not yet upgraded) is:

[2.2.3-RELEASE][root@]/root: uname -a
FreeBSD  10.1-RELEASE-p13 FreeBSD 10.1-RELEASE-p13 #0 c77d1b2(releng/10.1)-dirty: Tue Jun 23 17:00:47 CDT 2015     root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10  amd64

The pfSense version running at node-2 (the upgraded one) is:

[2.3.1-RELEASE][root@]/root: uname -a
FreeBSD xxxx.aaa.com 10.3-RELEASE-p3 FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016     root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense  amd64

Here are some additional statistics we’ve collected, just in case this may help diagnose:

  • Packets with errors at em0 and em1 interfaces
    [2.3.1-RELEASE][root@]/root: sysctl dev.em.0.mac_stats. | grep 'buff\|missed'
    dev.em.0.mac_stats.recv_no_buff: 28924720
    dev.em.0.mac_stats.missed_packets: 1109472
    [2.3.1-RELEASE][root@]/root: sysctl dev.em.1.mac_stats. | grep 'buff\|missed'
    dev.em.1.mac_stats.recv_no_buff: 2873803
    dev.em.1.mac_stats.missed_packets: 79003
    
  • Networks errors
    [2.3.1-RELEASE][root@]/root: netstat -ihw 1
                input        (Total)           output
       packets  errs idrops      bytes    packets  errs      bytes colls
           52k    30     0        32M        55k     0        36M     0
           46k     0     0        24M        48k     0        30M     0
           50k    64     0        31M        54k     0        35M     0
           45k    35     0        26M        48k     0        31M     0
           48k    19     0        28M        52k     0        33M     0
           45k     2     0        29M        48k     0        33M     0
           50k     0     0        30M        53k     0        35M     0
           50k     0     0        33M        53k     0        37M     0
           43k     9     0        28M        45k     0        32M     0
           53k    12     0        34M        56k     0        39M     0
           50k     0     0        30M        53k     0        34M     0
           44k     0     0        26M        47k     0        30M     0
    
    
  • Number of interrupts are very high for NICs
    [2.3.1-RELEASE][root@]/root: vmstat -i
    interrupt                          total       rate
    irq44: nfe1                     68575583        318
    irq4: uart0                         2298          0
    irq14: ata0                       143914          0
    irq20: ohci0                          26          0
    irq21: ehci0                           2          0
    irq22: nfe0                    225192527       1044
    irq56: em0                     121230546        562
    irq57: em1                     305005131       1414
    cpu0:timer                     242940061       1126
    irq256: mpt0                      763114          3
    cpu1:timer                      95960989        445
    cpu2:timer                     135271696        627
    cpu3:timer                     133771488        620
    Total                         1328857375       6164
    
    

We have been investigating whether an package of pfsense 2.3.1 or FreeBSD 10.3-release-p3 can cause problems, but we are unable to determine the cause of the problem.

Actions #1

Updated by Chris Buechler over 8 years ago

  • Category changed from Configuration Upgrade to Operating System
  • Status changed from New to Rejected
  • Priority changed from Very High to Normal
  • Target version deleted (2.3.1-p2)
  • Affected Version deleted (2.3.1)

that's something to do with your combination of hardware, which isn't anything we support. There aren't any general performance regressions between 2.2.x and 2.3.x, in fact the introduction of tryforward in 2.3x makes it faster than any prior version.

Actions #2

Updated by Juan Gallego over 8 years ago

Obviously there's a performance regression. It may have to do with our hardware, but:
  1. This wasnt present on previous pfSense versions.
  2. As we are using two different brands of network cards, simply pointing to a network card driver issue from freebsd, sounds improbable.
Actions #3

Updated by Juan Gallego over 8 years ago

And perhaps this post has the same problem

https://forum.pfsense.org/index.php?topic=113529.0

Actions #4

Updated by Chris Buechler over 8 years ago

It's likely not one NIC driver that's in question, as there definitely aren't any issues with e1000, and probably not with nfe either though that one isn't very widely used. It's definitely something specific to your combination of hardware, which we can't support. The linked thread has no relation. Continue on your forum thread, that's the best bet.

Actions #5

Updated by Rene Plattner over 8 years ago

Hi,

I also has the problem of the performance degration!
We have a setup of a small hardware box (N3150 Mini-ITX Board) and an virtual host (KVM).
After updating the virtual VPN-Server from 2.2.6 to 2.3.1 the performance degration dropped by factor 10.
The interessting fact was that it was asymmetric. The download was slow but the upload speed was normal.
We downgraded to 2.2.6 and it was ok again.

Interfaces und Disk are VirtIO Devices.

Actions

Also available in: Atom PDF