Project

General

Profile

Actions

Bug #13003

open

Malicious Driver Detection event on ixl driver

Added by Marcos Mendoza 3 months ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Hardware / Drivers
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

There have been a handful of reports of MDD events happening with the Intel X710 NIC. The system logs show the following:

ixl10: Malicious Driver Detection event 2 on TX queue 7, pf number 0
ixl10: MDD TX event is for this function!
ixl10: WARNING: queue 7 appears to be hung!
ixl10: Malicious Driver Detection event 2 on TX queue 4, pf number 0
ixl10: WARNING: queue 4 appears to be hung!

and

Oct 29 09:47:08 kernel ixl1: Malicious Driver Detection event 2 on TX queue 769, pf number 1 (PF-1)
Oct 29 09:37:28 kernel ixl1: Malicious Driver Detection event 2 on TX queue 773, pf number 1 (PF-1)

and https://forum.netgate.com/topic/158415/issues-with-an-intel-x710-and-pfsense-2-4-5-p1

Some info gathered from various reports and troubleshooting:
  • Occurs anywhere from once a day, to once a month.
  • Occurs on pfSense 2.4.5p1 and 22.01.
  • Occurs with PF traffic (SR-IOV not required to be enabled).
  • Occurs with TSO/LRO disabled.
  • Occurs with copper (RJ-45) and optical transceivers.
  • Most of the issue reports have been from those running a bridge interface with ixl0 and ixl1. However, there have been multiple reports without using bridges as well.
    Increasing the buffer size on the bridge reduced the frequency of the events (went from once a day to taking 5 days before it reoccurred).
Actions #1

Updated by Kris Phillips 3 months ago

I saw this occur on a 7100 that had two bridged ixl interfaces for an add in card on 21.05.2, so it may affect basically everything from 2.4.5p1 to 22.01, potentially.

Actions #2

Updated by Christoph Vieten 2 months ago

Same happened on 2.6.0 with Intel x710-T4 multiple times now.
Updating the nvme from 8.15 to latest 8.60 didn't fix the issue. Replacing the card with another X710 didn't help either.

sysctl -a | grep dev.ixl.0 | grep fw
dev.ixl.0.fw_lldp: 1
dev.ixl.0.fw_version: fw 8.6.68629 api 1.15 nvm 8.60 etid 8000bd5a oem 1.268.0

sysctl -a | grep dev.ixl.0.%desc
dev.ixl.0.%desc: Intel(R) Ethernet Controller X710/X557-AT 10GBASE-T - 2.3.1-k

Seems to only affect one port of the 4 ports, seems to be the one with the most traffic.

TSO is disabled by the checkbox and System => Advanced => Tunable => net.inet.tcp.tso set to 0

Actions #3

Updated by Kris Phillips 2 months ago

Christoph Vieten wrote in #note-2:

Same happened on 2.6.0 with Intel x710-T4 multiple times now.
Updating the nvme from 8.15 to latest 8.60 didn't fix the issue. Replacing the card with another X710 didn't help either.

sysctl -a | grep dev.ixl.0 | grep fw
dev.ixl.0.fw_lldp: 1
dev.ixl.0.fw_version: fw 8.6.68629 api 1.15 nvm 8.60 etid 8000bd5a oem 1.268.0

sysctl -a | grep dev.ixl.0.%desc
dev.ixl.0.%desc: Intel(R) Ethernet Controller X710/X557-AT 10GBASE-T - 2.3.1-k

Seems to only affect one port of the 4 ports, seems to be the one with the most traffic.

TSO is disabled by the checkbox and System => Advanced => Tunable => net.inet.tcp.tso set to 0

Christoph,

Were you running a bridge in your configuration like the original bug report seems to suggest is the root cause?

Actions #4

Updated by Marcos Mendoza 6 days ago

  • Description updated (diff)
Actions #5

Updated by Christoph Vieten 6 days ago

Kris Phillips wrote in #note-3:

Christoph Vieten wrote in #note-2:

Same happened on 2.6.0 with Intel x710-T4 multiple times now.
Updating the nvme from 8.15 to latest 8.60 didn't fix the issue. Replacing the card with another X710 didn't help either.

sysctl -a | grep dev.ixl.0 | grep fw
dev.ixl.0.fw_lldp: 1
dev.ixl.0.fw_version: fw 8.6.68629 api 1.15 nvm 8.60 etid 8000bd5a oem 1.268.0

sysctl -a | grep dev.ixl.0.%desc
dev.ixl.0.%desc: Intel(R) Ethernet Controller X710/X557-AT 10GBASE-T - 2.3.1-k

Seems to only affect one port of the 4 ports, seems to be the one with the most traffic.

TSO is disabled by the checkbox and System => Advanced => Tunable => net.inet.tcp.tso set to 0

Christoph,

Were you running a bridge in your configuration like the original bug report seems to suggest is the root cause?

Hi Kris,

no, were aren't running a bridge at all. But we are running approx. 20 vlan interfaces on the port that is affected.
Looks like when the issue occurs, you cannot switch to other physical ports (we have three of those X710 quad port cards in use) of any other adapter as well.
But the other ports in use (e.g. some 10g ports are configured without vlan assignments or have a smaller number of vlans) aren't affected by that driver / firmware stuck issue so can still be used.

Last time when the issue occurred, we migrated the top traffic vlan interfaces to separate ports resulting in a longer uptime until yesterday.

Did someone try the latest FreeBSD driver yet?
https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/intel-ix-kmod-3.3.24.pkg

Actions #6

Updated by Kris Phillips 3 days ago

Christoph Vieten wrote in #note-5:

Kris Phillips wrote in #note-3:

Christoph Vieten wrote in #note-2:

Same happened on 2.6.0 with Intel x710-T4 multiple times now.
Updating the nvme from 8.15 to latest 8.60 didn't fix the issue. Replacing the card with another X710 didn't help either.

sysctl -a | grep dev.ixl.0 | grep fw
dev.ixl.0.fw_lldp: 1
dev.ixl.0.fw_version: fw 8.6.68629 api 1.15 nvm 8.60 etid 8000bd5a oem 1.268.0

sysctl -a | grep dev.ixl.0.%desc
dev.ixl.0.%desc: Intel(R) Ethernet Controller X710/X557-AT 10GBASE-T - 2.3.1-k

Seems to only affect one port of the 4 ports, seems to be the one with the most traffic.

TSO is disabled by the checkbox and System => Advanced => Tunable => net.inet.tcp.tso set to 0

Christoph,

Were you running a bridge in your configuration like the original bug report seems to suggest is the root cause?

Hi Kris,

no, were aren't running a bridge at all. But we are running approx. 20 vlan interfaces on the port that is affected.
Looks like when the issue occurs, you cannot switch to other physical ports (we have three of those X710 quad port cards in use) of any other adapter as well.
But the other ports in use (e.g. some 10g ports are configured without vlan assignments or have a smaller number of vlans) aren't affected by that driver / firmware stuck issue so can still be used.

Last time when the issue occurred, we migrated the top traffic vlan interfaces to separate ports resulting in a longer uptime until yesterday.

Did someone try the latest FreeBSD driver yet?
https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/intel-ix-kmod-3.3.24.pkg

Hello Christoph,

I don't see any notes that it's been tested for this particular issue. However, the Intel ix driver was updated in 22.05. Have you tested to see if this issue is gone in the latest RC? We expect 22.05 to be released very soon, so might be worth a re-test on the latest.

Actions

Also available in: Atom PDF