Project

General

Profile

Bug #3250

problems with ixgbe driver in pfsense 2.1 release

Added by Zeev Zalessky over 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
Start date:
10/03/2013
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.1
Affected Architecture:
amd64

Description

Hello,

from release of 2.1 we have huge problems with intel 10Gb cards. as you can see in following forum topic (http://forum.pfsense.org/index.php/topic,66573.0.html) we try to use different workarounds but we can't fix problem with driver. we need urgent help. next week i'll try to approve payment for support in my company but currently our datacenter connected using workaround with ESXi server and VNXNET3 adapters.

config-WASLAB-FW1.EISLAB-IL.INTRA-20140128011706.xml (80.6 KB) config-WASLAB-FW1.EISLAB-IL.INTRA-20140128011706.xml configuration Zeev Zalessky, 01/27/2014 05:15 PM
loader.conf.local (639 Bytes) loader.conf.local loader.conf.local Zeev Zalessky, 01/27/2014 05:15 PM

History

#1 Updated by Renato Botelho over 7 years ago

  • Target version set to 2.1.1

#2 Updated by Zeev Zalessky over 7 years ago

Hi,

thanks for update.
any due date for 2.1.1 release?

#3 Updated by Jim Pingle over 7 years ago

Since this ticket is light on detail, there are a few main issues with the current driver:

1. Error message from the ix driver: kernel: CRITICAL: ECC ERROR!! Please Reboot!!
2. Some NICs will not function (will not obtain link, or pass traffic)
3. Those that do work have a major mbuf leak under load, often leading to a panic+reboot due to mbuf exhaustion
4. Some NICs work with untagged traffic but will not properly pass VLAN traffic. Sometimes toggling vlanhwfilter will allow this to work for a period of time.

#4 Updated by Zeev Zalessky about 7 years ago

Firewall is updated to 2.1.1-PRERELEASE (amd64) built on Sun Jan 19 03:33:57 EST 2014. After boot MBUF status is 32% (165510/512000) in server with 4 ixgbe NICs (2 LAGGs) and 2 igb NICs. Looks little bit high for system after boot. There is no traffic via this FW and in 20 min MBUF count raised to 166406. so looks like the problem of MBUFs is not fixed

1. Error message from the ix driver: kernel: CRITICAL: ECC ERROR!! Please Reboot!! FIXED
3. MBUF leak existing even w/o load.
4. for my NICs (1 dual port 82599EB 10G SFP+,1 dual port 82599EB 10G TN) VLAN traffic is working

#5 Updated by Brenton Denman-Murray about 7 years ago

Hi guys,

If there is anything I can test to help, please let me know.

I can confirm that use case (1) is no longer being experienced when plugging in an ethernet cable, in effect bringing the adapter online (X540-T2) previously would cause [ECC/OVER TEMP] alarms being piped to console.

Regarding use case (4) I did experience this issue (build 2.1.0), after toggling vlanhwfilter, the adapter would pass traffic, but eventually would result in a kernel panic followed by a reboot. As of (build 2.1.1-2014-01-22) I have been able to pass traffic without changing any adapter flags. successful operation was achieved by defining the vlans against the ix0 interface during the install and configuring the IP addresses post install.
Switch side configuration is [switchport mode trunk] [switchport trunk encapsulation dot1q], this was only tested at 1gbps line rate.
Testing comprised 90MB/sec ICMP 1400byte frames for 3 days with the ixgbe adapter successfully passing 28TB during this time.

Regarding use case (3) I have not seen increasing mbufs after [In/out packets (pass) (28381.37 GB/28675.67 GB)] netstat mbuf report is as follows:
8195/2440/10635 mbufs in use (current/cache/total)
8192/2166/10358/25600 mbuf clusters in use (current/cache/total/max)
8191/1537 mbuf+clusters out of packet secondary zone in use (current/cache)
0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
18432K/5358K/23790K bytes allocated to network (current/cache/total)

Again, please advise if there is anything I can do to help.

#6 Updated by Zeev Zalessky about 7 years ago

hi,

i test now firewall on my production load.
MBUFs raze detected on heavy arp load, i have more then 3000 servers in 30+ VLANs.
kernel panic after 1 hour of iperf load (4.8Gb/s). I'll add my configuration

#7 Updated by Zeev Zalessky about 7 years ago

looks like kernel panic caused by concurrency in ixgbe driver. i found some patches in freebsd list: http://article.gmane.org/gmane.os.freebsd.devel.net/38439/match=ixgbe. dump sent to developers

#8 Updated by Zeev Zalessky about 7 years ago

kernel panic again even without load:
Tracing pid 12 tid 100075 td 0xffffff000b502460
m_copy_nbufs() at m_copy_nbufs+0x40
ip_fragment() at ip_fragment+0x134
ip_output() at ip_output+0xf1d
ip_forward() at ip_forward+0x19a
ip_input() at ip_input+0x65d
netisr_dispatch_src() at netisr_dispatch_src+0x7b
ether_demux() at ether_demux+0x169
ether_input() at ether_input+0x191
ether_demux() at ether_demux+0x72
ether_input() at ether_input+0x191
ixgbe_rxeof() at ixgbe_rxeof+0x29b
ixgbe_msix_que() at ixgbe_msix_que+0xb1
intr_event_execute_handlers() at intr_event_execute_handlers+0x104
ithread_loop() at ithread_loop+0x95
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe

#9 Updated by Ermal Lu├ži about 7 years ago

  • Status changed from New to Feedback

Next build will be with previous drivers which are more stable.

#10 Updated by Zeev Zalessky about 7 years ago

Hi,

i received following link that can fix IXGBE driver problem. http://christopher-technicalmusings.blogspot.com.au/2013/03/network-mbuf-leak-exhaustion-in-freebsd.html . can we check if this will fix last ixgbe problem in pfsense?

#11 Updated by Chris Buechler about 7 years ago

  • Status changed from Feedback to Closed

reverted back to 2.1-REL driver which will suffice for 2.1.1.

#12 Updated by Zeev Zalessky about 7 years ago

revert back to 2.1-REL driver don't fix problem with MBUF in 2.1-REL driver. please reopen

#13 Updated by Chris Buechler about 7 years ago

that's something to be revisited in 10.x-based releases, as the situation will be different there.

Also available in: Atom PDF