Bug #6512
closedUpgrade to 2.3.1 causes network performance degradation (with High CPU usage by NIC kernel tasks)
0%
Description
Hi,
Currently we have a 2-node pfsense system working in active/passive HA. This cluster was running pfsense v2.2.3, and recently we upgraded the slave node to v2.3.1-1, and forced a failover from master to slave in order to test things..
This system is running a mix of normal pf filtering, some IPSec tunnels, and also an HAProxy instance exposing a few (+-10) frontend+backends.
However, the upgraded node (when running as master), shows a clear network performance degradation: While node-1 (the one still running v2.2.3) can easily forward traffic at +250Mb/s, the alternate node (the one running v2.3) tops at +-80Mb/s.
While diagnosing the issue we’ve found node running pfSense v2.3 to have a high load under such a ‘low’ traffic (ie. 80Mb/s), and high CPU usage by network drivers, as show below:
[2.3.1-RELEASE][root@]/root: top -nCHSIzs1 last pid: 28317; load averages: 4.07, 4.23, 4.37 up 2+11:40:04 16:22:50 311 processes: 9 running, 282 sleeping, 20 waiting Mem: 31M Active, 502M Inact, 385M Wired, 883M Buf, 5020M Free Swap: PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND 0 root -92 - 0K 240K CPU1 1 21.8H 99.37% kernel{nfe0 taskq} 0 root -92 - 0K 240K CPU2 2 29.4H 73.29% kernel{em0 taskq} 0 root -92 - 0K 240K CPU0 0 18.6H 44.78% kernel{em1 taskq} 12 root -72 - 0K 336K WAIT 0 65:15 14.60% intr{swi1: netisr 0} 438 nobody 22 0 30184K 4404K select 3 121:18 4.79% dnsmasq 28430 root 21 0 43756K 17440K kqread 3 51:40 1.46% haproxy 12 root -72 - 0K 336K WAIT 0 31:51 1.37% intr{swi1: pfsync} 90479 root 20 0 25720K 7176K select 2 23:43 0.59% openvpn 49607 root 20 0 14516K 2320K select 0 28:31 0.29% syslogd 30713 root 20 0 16676K 2736K bpf 0 18:55 0.10% filterlog 28317 root 21 0 21856K 2992K CPU2 2 0:00 0.10% top
Obviously, firewall rules, services configuration, IPSec tunnels, etc. are configured the same on both nodes. And we’ve compared system values (like tunables, /boot/loader.conf*, and runtime sysctl values across both nodes).. So it looks like an issue with regard to pfSense 2.3 kernel code changes/enhancements.
Both nodes are identical hardware, consisting on:
FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016 root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 CPU: Dual-Core AMD Opteron(tm) Processor 2216 (2393.69-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x40f12 Family=0xf Model=0x41 Stepping=2 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x2001<SSE3,CX16> AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8> SVM: NAsids=64 real memory = 6442450944 (6144 MB) avail memory = 6194679808 (5907 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <SUN X4200 M2> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3
The network interface cards available at both nodes are as follows:
nfe0: NVIDIA nForce4 CK804 MCP9 Networking Adapter nfe1: NVIDIA nForce4 CK804 MCP9 Networking Adapter em0: Intel(R) PRO/1000 (82546EB) em1: Intel(R) PRO/1000 (82546EB) lagg0: LACP lagg with em0, em1 & nfe0 attached.
The pfSense version running at node-1 (the one not yet upgraded) is:
[2.2.3-RELEASE][root@]/root: uname -a FreeBSD 10.1-RELEASE-p13 FreeBSD 10.1-RELEASE-p13 #0 c77d1b2(releng/10.1)-dirty: Tue Jun 23 17:00:47 CDT 2015 root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10 amd64
The pfSense version running at node-2 (the upgraded one) is:
[2.3.1-RELEASE][root@]/root: uname -a FreeBSD xxxx.aaa.com 10.3-RELEASE-p3 FreeBSD 10.3-RELEASE-p3 #2 1988fec(RELENG_2_3_1): Wed May 25 14:14:46 CDT 2016 root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense amd64
Here are some additional statistics we’ve collected, just in case this may help diagnose:
- Packets with errors at em0 and em1 interfaces
[2.3.1-RELEASE][root@]/root: sysctl dev.em.0.mac_stats. | grep 'buff\|missed' dev.em.0.mac_stats.recv_no_buff: 28924720 dev.em.0.mac_stats.missed_packets: 1109472 [2.3.1-RELEASE][root@]/root: sysctl dev.em.1.mac_stats. | grep 'buff\|missed' dev.em.1.mac_stats.recv_no_buff: 2873803 dev.em.1.mac_stats.missed_packets: 79003
- Networks errors
[2.3.1-RELEASE][root@]/root: netstat -ihw 1 input (Total) output packets errs idrops bytes packets errs bytes colls 52k 30 0 32M 55k 0 36M 0 46k 0 0 24M 48k 0 30M 0 50k 64 0 31M 54k 0 35M 0 45k 35 0 26M 48k 0 31M 0 48k 19 0 28M 52k 0 33M 0 45k 2 0 29M 48k 0 33M 0 50k 0 0 30M 53k 0 35M 0 50k 0 0 33M 53k 0 37M 0 43k 9 0 28M 45k 0 32M 0 53k 12 0 34M 56k 0 39M 0 50k 0 0 30M 53k 0 34M 0 44k 0 0 26M 47k 0 30M 0
- Number of interrupts are very high for NICs
[2.3.1-RELEASE][root@]/root: vmstat -i interrupt total rate irq44: nfe1 68575583 318 irq4: uart0 2298 0 irq14: ata0 143914 0 irq20: ohci0 26 0 irq21: ehci0 2 0 irq22: nfe0 225192527 1044 irq56: em0 121230546 562 irq57: em1 305005131 1414 cpu0:timer 242940061 1126 irq256: mpt0 763114 3 cpu1:timer 95960989 445 cpu2:timer 135271696 627 cpu3:timer 133771488 620 Total 1328857375 6164
We have been investigating whether an package of pfsense 2.3.1 or FreeBSD 10.3-release-p3 can cause problems, but we are unable to determine the cause of the problem.
Updated by Chris Buechler over 8 years ago
- Category changed from Configuration Upgrade to Operating System
- Status changed from New to Rejected
- Priority changed from Very High to Normal
- Target version deleted (
2.3.1-p2) - Affected Version deleted (
2.3.1)
that's something to do with your combination of hardware, which isn't anything we support. There aren't any general performance regressions between 2.2.x and 2.3.x, in fact the introduction of tryforward in 2.3x makes it faster than any prior version.
Updated by Juan Gallego over 8 years ago
- This wasnt present on previous pfSense versions.
- As we are using two different brands of network cards, simply pointing to a network card driver issue from freebsd, sounds improbable.
Updated by Juan Gallego over 8 years ago
And perhaps this post has the same problem
Updated by Chris Buechler over 8 years ago
It's likely not one NIC driver that's in question, as there definitely aren't any issues with e1000, and probably not with nfe either though that one isn't very widely used. It's definitely something specific to your combination of hardware, which we can't support. The linked thread has no relation. Continue on your forum thread, that's the best bet.
Updated by Rene Plattner over 8 years ago
Hi,
I also has the problem of the performance degration!
We have a setup of a small hardware box (N3150 Mini-ITX Board) and an virtual host (KVM).
After updating the virtual VPN-Server from 2.2.6 to 2.3.1 the performance degration dropped by factor 10.
The interessting fact was that it was asymmetric. The download was slow but the upload speed was normal.
We downgraded to 2.2.6 and it was ok again.
Interfaces und Disk are VirtIO Devices.