Project

General

Profile

Actions

Regression #14180

open

ConnectX-4 LX MCX4121A-ACAT - VT-d passthrough of both ports, virtualized pfSense fails to boot due to mlx5 driver errors

Added by name name about 1 year ago. Updated 5 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
Hardware / Drivers
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Release Notes:
Default
Affected Plus Version:
23.01
Affected Architecture:
amd64

Description

I've been running the following configuration for months now:

Hypervisor:

Linux Kernel 5.15
libvirt/qemu/kvm

pfSense VM:
i440fx
VT-d passthrough of both ports of MCX4121A-ACAT
IOMMU/ACS are all fine on the Supermicro server mainboard X11SPL-F

After updating from CE 2.6.0 to Plus 23.01 it's not working anymore.

Libvirt successfully starts the VM with the PCI devices (both ports of the network adapter) passed through (VT-d).

At kernel bootup I see these error messages:

mlx5_core1: WARN: wait_func:967:(pid 0): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: give_pages:354:(pid 0): func_id 0x0, npages 1241, err -60
mlx5_core1: WARN: wait_func:967:(pid 0): CREATE_EQ(0x301) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: wait_func:967:(pid 0): DESTROY_EQ(0x302) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: mlx5_destroy_unmap_eq:523:(pid 0): failed to destroy a previously created eq: eqn 7
mlx5_core1: WARN: wait_func:967:(pid 0): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: give_pages:375:(pid 0): page notify failed
mlx5_core1: WARN: free_comp_eqs:671:(pid 0): failed to destroy EQ 0x7
mlx5_core1: WARN: pages_work_handler:475:(pid 0): give fail -60
mlx5_core1: WARN: wait_func:967:(pid 0): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
mlx5_core1: ERR: reclaim_pages:444:(pid 0): failed reclaiming pages
mlx5_core1: WARN: pages_work_handler:475:(pid 0): reclaim fail -60
mlx5_core1: WARN: wait_func:967:(pid 0): DESTROY_EQ(0x302) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: mlx5_destroy_unmap_eq:523:(pid 0): failed to destroy a previously created eq: eqn 8
mlx5_core1: WARN: free_comp_eqs:671:(pid 0): failed to destroy EQ 0x8
mlx5_core1: WARN: wait_func:967:(pid 0): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
mlx5_core1: ERR: reclaim_pages:444:(pid 0): failed reclaiming pages
mlx5_core1: WARN: pages_work_handler:475:(pid 0): reclaim fail -60
mlx5_core1: WARN: wait_func:967:(pid 0): DESTROY_EQ(0x302) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: mlx5_destroy_unmap_eq:523:(pid 0): failed to destroy a previously created eq: eqn 9
mlx5_core1: WARN: free_comp_eqs:671:(pid 0): failed to destroy EQ 0x9
mlx5_core1: WARN: wait_func:967:(pid 0): DEALLOC_UAR(0x803) timeout. Will cause a leak of a command resource
mlx5_core1: WARN: up_rel_func:89:(pid 0): failed to free uar index 16

Sometimes it boots fine, sometimes the error messages appear and it never progresses to the part where the actual OS starts, and sometimes it reaches the part where pfSense starts, but then the network interfaces aren't available and it asks me to manually reassign the configuration interfaces.

This makes it unusable for me at the moment.

Actions

Also available in: Atom PDF