Project

General

Profile

Actions

Bug #6512

closed

Upgrade to 2.3.1 causes network performance degradation (with High CPU usage by NIC kernel tasks)

Added by Juan Gallego almost 8 years ago. Updated almost 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Operating System
Target version:
-
Start date:
06/21/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
Affected Architecture:

Description

Hi,

Currently we have a 2-node pfsense system working in active/passive HA. This cluster was running pfsense v2.2.3, and recently we upgraded the slave node to v2.3.1-1, and forced a failover from master to slave in order to test things..

This system is running a mix of normal pf filtering, some IPSec tunnels, and also an HAProxy instance exposing a few (+-10) frontend+backends.

However, the upgraded node (when running as master), shows a clear network performance degradation: While node-1 (the one still running v2.2.3) can easily forward traffic at +250Mb/s, the alternate node (the one running v2.3) tops at +-80Mb/s.

While diagnosing the issue we’ve found node running pfSense v2.3 to have a high load under such a ‘low’ traffic (ie. 80Mb/s), and high CPU usage by network drivers, as show below:

[2.3.1-RELEASE][root@]/root: top -nCHSIzs1
last pid: 28317;  load averages:  4.07,  4.23,  4.37  up 2+11:40:04    16:22:50
311 processes: 9 running, 282 sleeping, 20 waiting

Mem: 31M Active, 502M Inact, 385M Wired, 883M Buf, 5020M Free
Swap:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME     CPU COMMAND
    0 root       -92    -     0K   240K CPU1    1  21.8H  99.37% kernel{nfe0 taskq}
    0 root       -92    -     0K   240K CPU2    2  29.4H  73.29% kernel{em0 taskq}
    0 root       -92    -     0K   240K CPU0    0  18.6H  44.78% kernel{em1 taskq}
   12 root       -72    -     0K   336K WAIT    0  65:15  14.60% intr{swi1: netisr 0}
  438 nobody      22    0 30184K  4404K select  3 121:18   4.79% dnsmasq
28430 root        21    0 43756K 17440K kqread  3  51:40   1.46% haproxy
   12 root       -72    -     0K   336K WAIT    0  31:51   1.37% intr{swi1: pfsync}
90479 root        20    0 25720K  7176K select  2  23:43   0.59% openvpn
49607 root        20    0 14516K  2320K select  0  28:31   0.29% syslogd
30713 root        20    0 16676K  2736K bpf     0  18:55   0.10% filterlog
28317 root        21    0 21856K  2992K CPU2    2   0:00   0.10% top

Obviously, firewall rules, services configuration, IPSec tunnels, etc. are configured the same on both nodes. And we’ve compared system values (like tunables, /boot/loader.conf*, and runtime sysctl values across both nodes).. So it looks like an issue with regard to pfSense 2.3 kernel code changes/enhancements.

Both nodes are identical hardware, consisting on:

FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016
    root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Dual-Core AMD Opteron(tm) Processor 2216 (2393.69-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f12  Family=0xf  Model=0x41  Stepping=2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 6442450944 (6144 MB)
avail memory = 6194679808 (5907 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3

The network interface cards available at both nodes are as follows:

nfe0: NVIDIA nForce4 CK804 MCP9 Networking Adapter
nfe1: NVIDIA nForce4 CK804 MCP9 Networking Adapter
em0: Intel(R) PRO/1000 (82546EB)
em1: Intel(R) PRO/1000 (82546EB)
lagg0: LACP lagg with em0, em1 & nfe0 attached.

The pfSense version running at node-1 (the one not yet upgraded) is:

[2.2.3-RELEASE][root@]/root: uname -a
FreeBSD  10.1-RELEASE-p13 FreeBSD 10.1-RELEASE-p13 #0 c77d1b2(releng/10.1)-dirty: Tue Jun 23 17:00:47 CDT 2015     root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10  amd64

The pfSense version running at node-2 (the upgraded one) is:

[2.3.1-RELEASE][root@]/root: uname -a
FreeBSD xxxx.aaa.com 10.3-RELEASE-p3 FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016     root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense  amd64

Here are some additional statistics we’ve collected, just in case this may help diagnose:

  • Packets with errors at em0 and em1 interfaces
    [2.3.1-RELEASE][root@]/root: sysctl dev.em.0.mac_stats. | grep 'buff\|missed'
    dev.em.0.mac_stats.recv_no_buff: 28924720
    dev.em.0.mac_stats.missed_packets: 1109472
    [2.3.1-RELEASE][root@]/root: sysctl dev.em.1.mac_stats. | grep 'buff\|missed'
    dev.em.1.mac_stats.recv_no_buff: 2873803
    dev.em.1.mac_stats.missed_packets: 79003
    
  • Networks errors
    [2.3.1-RELEASE][root@]/root: netstat -ihw 1
                input        (Total)           output
       packets  errs idrops      bytes    packets  errs      bytes colls
           52k    30     0        32M        55k     0        36M     0
           46k     0     0        24M        48k     0        30M     0
           50k    64     0        31M        54k     0        35M     0
           45k    35     0        26M        48k     0        31M     0
           48k    19     0        28M        52k     0        33M     0
           45k     2     0        29M        48k     0        33M     0
           50k     0     0        30M        53k     0        35M     0
           50k     0     0        33M        53k     0        37M     0
           43k     9     0        28M        45k     0        32M     0
           53k    12     0        34M        56k     0        39M     0
           50k     0     0        30M        53k     0        34M     0
           44k     0     0        26M        47k     0        30M     0
    
    
  • Number of interrupts are very high for NICs
    [2.3.1-RELEASE][root@]/root: vmstat -i
    interrupt                          total       rate
    irq44: nfe1                     68575583        318
    irq4: uart0                         2298          0
    irq14: ata0                       143914          0
    irq20: ohci0                          26          0
    irq21: ehci0                           2          0
    irq22: nfe0                    225192527       1044
    irq56: em0                     121230546        562
    irq57: em1                     305005131       1414
    cpu0:timer                     242940061       1126
    irq256: mpt0                      763114          3
    cpu1:timer                      95960989        445
    cpu2:timer                     135271696        627
    cpu3:timer                     133771488        620
    Total                         1328857375       6164
    
    

We have been investigating whether an package of pfsense 2.3.1 or FreeBSD 10.3-release-p3 can cause problems, but we are unable to determine the cause of the problem.

Actions

Also available in: Atom PDF