Project

General

Profile

Actions

Bug #9123

open

Adding/configuring vlan on ixl-devices causes aq_add_macvlan err -53, aq_error 14

Added by Sebastian Deuerling about 6 years ago. Updated over 2 years ago.

Status:
Feedback
Priority:
Very High
Assignee:
Category:
Interfaces
Target version:
-
Start date:
11/15/2018
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.4.x
Affected Architecture:
amd64

Description

The actual vlan addition/configuring process is triggering error "aq_add_macvlan err -53, aq_error 14" on ixl-devices.
Configuring vlans seems to work nevertheless, but saving interface configurations with vlans takes a lot of time.
In our setup (two igb-interfaces, two ix-interfaces, two ixl-interfaces; 25 vlans on failover-lagg of ixl0 and igb0) saving changes on interface configuration lasts around about 20 to 30 minutes. After that pfSense seems to freeze. After reboot all vlans are working.
But booting also takes a lof of time. Around 5 minutes in step "Configuring VLANS...".
Our hardware: SYS-5018D-FN4T (Supermicro Intel Xeon D-1541 system) and X710DA2BLK (Intel X710-DA2 Dual-SFP+-PCIe-Addon-cards).
Further information here: https://forum.netgate.com/topic/136201/new-version-2-4-4-interface-error-aq_add_macvlan-err-53-aq_error-14/14

Actions #1

Updated by Steve Wheeler almost 6 years ago

  • Category set to Interfaces
  • Affected Version set to 2.4.x
  • Affected Architecture amd64 added
  • Affected Architecture deleted ()
Actions #2

Updated by Alex Rosenberg almost 6 years ago

Not sure about the similarity of conditions yet, but I'm seeing this message being logged on my FreeNAS box with the ixl driver on top of the SuperMicro X11DPH-Tq motherboard. I'm beginning to suspect a problem either with the ixl driver or the NVM fw image.

Actions #3

Updated by Alexander Meckelein almost 6 years ago

Same problem here.

Hardware: Dell PowerEdge 330 with Intel(R) 10GbE 2P X710 Adapter

Done so far:
  • installation of PFSense 2.4.4 with "pfSense-CE-memstick-2.4.4-RELEASE-p1-amd64" Image, used option ZFS mirror on 2 SSDs
  • on first boot of installed PFSense I configured the "ixl0" interface with VLAN Tag, immediately got the error "aq_add_macvlan err -53, aq_error 14" in the terminal.
  • reset installation to factory default, reboot
  • tested same on the other interface "ixl1", got the same error
  • both interfaces were not able to get IP by DHCP

Atm I did not test the PFSense 2.4.3 image on this system, this is planned for this week on friday.

Actions #4

Updated by Alexander Meckelein almost 6 years ago

I have tested the FreeBSD Version 11.1, 11.2 and 12.0 on the Hardware and got following results.

FreeBSD 11.1 (FreeBSD-11.1-RELEASE-amd64-memstick) (Driver 1.7.12-k)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
ifconfig ixl0 up

-> Ping OK

FreeBSD 11.2 (FreeBSD-11.2-RELEASE-amd64-memstick) (Driver 1.9.9-k)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
terminal message: ixl0: aq_add_macvlan err 53, aq_error 14
ifconfig ixl0 up
terminal message: ixl0: aq_add_macvlan err -53, aq_error 14
> Ping OK (448 Packets, 0% lost)

FreeBSD 12 (FreeBSD-12.0-RELEASE-amd64-memstick)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
ifconfig ixl0 up

-> Ping OK

Actions #5

Updated by Eric Machabert about 5 years ago

Hi,

As I explained in the forum : this is my currently working solution while runing 2.4.4p3:
- Using lagg in failover mode and not LACP (using LACP requires to disable firmware based LLDP agent per Intel documentation. we did not test it)
- Using 1.11.20 driver (september 2019) from Intel website

We do use lots of vlan and we don't see any issue after 15 days of uptime. We did test CARP failover, interface assignment on newly declared vlan and had no "Queue appears to be hung" issue.

information regarding the NICs (HPE NC562SFP+):

ixl0: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.11.20> mem
0xe4000000-0xe4ffffff,0xe5008000-0xe500ffff at device 0.0 numa-domain 0 on pci4
ixl0: using 1024 tx descriptors and 1024 rx descriptors
ixl0: fw 6.71.49427 api 1.7 nvm 6.80 etid 80004004 oem 1.263.0

Actions #6

Updated by Luiz Souza over 4 years ago

  • Assignee set to Luiz Souza
  • Priority changed from Normal to Very High
Actions #7

Updated by Marc L over 4 years ago

Some more data/observations:
  • NIC: Intel X710-DA4 (Quad Port 10Gb)
  • pfSense version 2.4.4-p3
  • One LAGG group (lagg0) Protocol set to LACP
  • 31 VLANs (all lagg0.x)
  • Adding more VLANs on-the-fly works for us, though we do it rarely. We used to have the LAGG Protocol set to "NONE" which would consistently produce crashes when adding a new VLAN
  • Applying Interface settings is pretty slow but works
  • However i did encounter a crash when changing the description of a VLAN (on the "VLANs" tab)

The logs collected from the crash make it look like changing the description and saving caused all VLANs to be taken down and reconfigured. Shouldn't be necessary in the first place, because the description is just webinterface cosmetics, buts that's a different topic.

I followed this procedure twice (edit VLAN -> set new description -> hit save). Was physically present and watched the console. The first time around, the webinterface hung for a while, and the mentioned error was logged to the console a few times, but it succeeded eventually. The second time the webinterface hung again, saw a few of the error messages on the console and ultimately the system crashed and rebooted. Below is the relevant tail end of the collected logs from the crash report.

msgbuf.txt:

ixl0: aq_add_macvlan err -53, aq_error 14
<6>carp: 9@lagg0.101: BACKUP -> INIT (hardware interface down)
<6>carp: demoted by 240 to 240 (interface down)
<7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3
<7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3
<7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3
<6>carp: demoted by -240 to 0 (vhid removed)
<6>lagg0.101: promiscuous mode disabled
<6>vlan5: changing name to 'lagg0.101'
ixl0: aq_add_macvlan err -53, aq_error 14
ixl0: aq_add_macvlan err -53, aq_error 14
ixl0: aq_add_macvlan err -53, aq_error 14
ixl0: aq_add_macvlan err -53, aq_error 14
<6>lagg0.101: promiscuous mode enabled
<6>carp: 9@lagg0.101: INIT -> BACKUP (initialization complete)
<6>carp: 19@lagg0.205: BACKUP -> MASTER (master timed out)
<6>carp: 10@lagg0.102: BACKUP -> MASTER (master timed out)
<6>carp: 13@lagg0.112: BACKUP -> MASTER (master timed out)
<6>carp: 34@lagg0.82: BACKUP -> MASTER (master timed out)
<6>carp: 8@lagg0.100: BACKUP -> MASTER (master timed out)

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer    = 0x20:0xffffffff80e38d40
stack pointer            = 0x28:0xfffffe04549b48b0
frame pointer            = 0x28:0xfffffe04549b48f0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 12 (swi4: clock (0))

version.txt

FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019
    root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense

Backtrace

db:0:kdb.enter.default>  show pcpu
cpuid        = 0
dynamic pcpu = 0x898380
curthread    = 0xfffff80008438620: pid 12 "swi4: clock (0)" 
curpcb       = 0xfffffe04549b4b80
fpcurthread  = none
idlethread   = 0xfffff800083d5000: tid 100003 "idle: cpu0" 
curpmap      = 0xffffffff82b85998
tssp         = 0xffffffff82bb6810
commontssp   = 0xffffffff82bb6810
rsp0         = 0xfffffe04549b4b80
gs32p        = 0xffffffff82bbd068
ldt          = 0xffffffff82bbd0a8
tss          = 0xffffffff82bbd098
db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100026 td 0xfffff80008438620
carp_master_down_locked() at carp_master_down_locked+0xf0/frame 0xfffffe04549b48f0
carp_master_down() at carp_master_down+0x21/frame 0xfffffe04549b4910
softclock_call_cc() at softclock_call_cc+0x13a/frame 0xfffffe04549b49c0
softclock() at softclock+0x79/frame 0xfffffe04549b49e0
intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe04549b4a20
ithread_loop() at ithread_loop+0xe7/frame 0xfffffe04549b4a70
fork_exit() at fork_exit+0x83/frame 0xfffffe04549b4ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04549b4ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Actions #8

Updated by Steve Wheeler about 4 years ago

  • Status changed from New to Feedback

This particular problem, where adding VLANs generates an error, appears to be solved in the ixl 1.11.9 driver in pfSense 2.4.5.

The 'queue <num> appears to be hung!' issue appears unrelated and should be a separate bug if it persists.

Actions #9

Updated by Steve Wheeler about 4 years ago

  • % Done changed from 0 to 100
Actions #10

Updated by → luckman212 over 2 years ago

I was just looking at Open issues marked "very high" and this still comes up -- should it be closed?

Actions #11

Updated by Jim Pingle over 2 years ago

→ luckman212 wrote in #note-10:

I was just looking at Open issues marked "very high" and this still comes up -- should it be closed?

It needs re-tested by someone with the affected hardware + problem config.

Actions

Also available in: Atom PDF