Project

General

Profile

Actions

Bug #5913

closed

Link cycling issue with some ix NICs

Added by Chris Buechler about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Very High
Assignee:
Category:
Operating System
Target version:
Start date:
02/19/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
All
Affected Architecture:
amd64

Description

There's an issue with the XG-1540's 10 Gb NICs where they cycle link upon applying certain settings to the interface, like rxcsum and txcsum, among others. Some of those things are things we do in rc.linkup or things called from it. That leaves a variety of circumstances where the NIC gets stuck in a link down/up/down/up cycle. For instance if it loses link at any point past bootup, that happens. Among a variety of other reports:
https://www.reddit.com/r/PFSENSE/comments/45bcuq/10_gig_woes/

It also has some general issue where any time the link comes up, it cycles on its own, "2 link states coalesced" is always logged.

After several hours into figuring out what all causes the thing to lose link, I came to the conclusion that it's way too many different things for us to avoid the root issue in the driver.

It doesn't happen on any Intel 10G cards with SFP+ that we have around. The XG-C2758 has no such problem.

The problem does not exist in the latest v3.1.14 driver from Intel.

Actions #1

Updated by Jim Thompson about 8 years ago

  • Assignee set to Luiz Souza
  • Affected Architecture amd64 added
  • Affected Architecture deleted ()

It doesn't happen on any Intel 10G cards with SFP+ that we have around. The XG-C2758 has no such problem.

XG-C2758 uses a 82599.

I don't know what other "intel 10G cards with SFP+" you tried, so I can't comment.

The device on the 1540-D identifies as:
[2.3-BETA][]/root: pciconf -l | grep ix
ix0@pci0:3:0:0: class=0x020000 card=0x15ad15d9 chip=0x15ad8086 rev=0x00 hdr=0x00
ix1@pci0:3:0:1: class=0x020000 card=0x15ad15d9 chip=0x15ad8086 rev=0x00 hdr=0x00

This is known to the driver as: IXGBE_DEV_ID_X550EM_X_10G_T

ixgbe_type.h:#define IXGBE_DEV_ID_X550EM_X_10G_T 0x15AD

(this is on the machine that you apparently upgraded to the 3.1.14 driver).

The driver in -HEAD is 3.1.13-k, (FreeBSD-10.2 is apparently the same base driver from Intel).

https://github.com/pfsense/FreeBSD-src/blob/devel/sys/dev/ixgbe/if_ix.c#L51 says:
char ixgbe_driver_version[] = "3.1.13-k";

There are a number of differences specific to IXGBE_DEV_ID_X550EM_X_10G_T between the two revisions of the base driver, especially this:

        if (hw->device_id == IXGBE_DEV_ID_X550EM_X_10G_T) {
                /* Config MDIO clock speed before the first MDIO PHY access */
                hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
                hlreg0 &= ~IXGBE_HLREG0_MDCSPD;
                IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0);
        }

found in ixgbe_reset_hw_X550em() in 3.1.14 driver, but not in the 3.1.13k driver. There are also a lot of minor (but could be significant) changes to the way link speeds are negotiated.

Assigned to Luiz.

Actions #2

Updated by Luiz Souza about 8 years ago

  • Priority changed from Normal to Very High
Actions #3

Updated by Luiz Souza about 8 years ago

Updated the ix driver to Intel version 3.1.14.

Actions #4

Updated by Chris Buechler about 8 years ago

  • Status changed from Confirmed to Resolved

all the circumstances that failed before are good on the latest 2.3 snapshot

Actions

Also available in: Atom PDF