Bug #9123
openAdding/configuring vlan on ixl-devices causes aq_add_macvlan err -53, aq_error 14
Added by Sebastian Deuerling about 6 years ago. Updated over 2 years ago.
100%
Description
The actual vlan addition/configuring process is triggering error "aq_add_macvlan err -53, aq_error 14" on ixl-devices.
Configuring vlans seems to work nevertheless, but saving interface configurations with vlans takes a lot of time.
In our setup (two igb-interfaces, two ix-interfaces, two ixl-interfaces; 25 vlans on failover-lagg of ixl0 and igb0) saving changes on interface configuration lasts around about 20 to 30 minutes. After that pfSense seems to freeze. After reboot all vlans are working.
But booting also takes a lof of time. Around 5 minutes in step "Configuring VLANS...".
Our hardware: SYS-5018D-FN4T (Supermicro Intel Xeon D-1541 system) and X710DA2BLK (Intel X710-DA2 Dual-SFP+-PCIe-Addon-cards).
Further information here: https://forum.netgate.com/topic/136201/new-version-2-4-4-interface-error-aq_add_macvlan-err-53-aq_error-14/14
Updated by Steve Wheeler almost 6 years ago
- Category set to Interfaces
- Affected Version set to 2.4.x
- Affected Architecture amd64 added
- Affected Architecture deleted (
)
Updated by Alex Rosenberg almost 6 years ago
Not sure about the similarity of conditions yet, but I'm seeing this message being logged on my FreeNAS box with the ixl driver on top of the SuperMicro X11DPH-Tq motherboard. I'm beginning to suspect a problem either with the ixl driver or the NVM fw image.
Updated by Alexander Meckelein almost 6 years ago
Same problem here.
Hardware: Dell PowerEdge 330 with Intel(R) 10GbE 2P X710 Adapter
Done so far:- installation of PFSense 2.4.4 with "pfSense-CE-memstick-2.4.4-RELEASE-p1-amd64" Image, used option ZFS mirror on 2 SSDs
- on first boot of installed PFSense I configured the "ixl0" interface with VLAN Tag, immediately got the error "aq_add_macvlan err -53, aq_error 14" in the terminal.
- reset installation to factory default, reboot
- tested same on the other interface "ixl1", got the same error
- both interfaces were not able to get IP by DHCP
Atm I did not test the PFSense 2.4.3 image on this system, this is planned for this week on friday.
Updated by Alexander Meckelein almost 6 years ago
I have tested the FreeBSD Version 11.1, 11.2 and 12.0 on the Hardware and got following results.
FreeBSD 11.1 (FreeBSD-11.1-RELEASE-amd64-memstick) (Driver 1.7.12-k)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
ifconfig ixl0 up
-> Ping OK
FreeBSD 11.2 (FreeBSD-11.2-RELEASE-amd64-memstick) (Driver 1.9.9-k)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
terminal message: ixl0: aq_add_macvlan err 53, aq_error 14> Ping OK (448 Packets, 0% lost)
ifconfig ixl0 up
terminal message: ixl0: aq_add_macvlan err -53, aq_error 14
FreeBSD 12 (FreeBSD-12.0-RELEASE-amd64-memstick)
ifconfig ixl0.1 create vlan 1 vlandev ixl0 inet 192.168.23.223/24
ifconfig ixl0 up
-> Ping OK
Updated by Eric Machabert almost 5 years ago
Hi,
As I explained in the forum : this is my currently working solution while runing 2.4.4p3:
- Using lagg in failover mode and not LACP (using LACP requires to disable firmware based LLDP agent per Intel documentation. we did not test it)
- Using 1.11.20 driver (september 2019) from Intel website
We do use lots of vlan and we don't see any issue after 15 days of uptime. We did test CARP failover, interface assignment on newly declared vlan and had no "Queue appears to be hung" issue.
information regarding the NICs (HPE NC562SFP+):
ixl0: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.11.20> mem
0xe4000000-0xe4ffffff,0xe5008000-0xe500ffff at device 0.0 numa-domain 0 on pci4
ixl0: using 1024 tx descriptors and 1024 rx descriptors
ixl0: fw 6.71.49427 api 1.7 nvm 6.80 etid 80004004 oem 1.263.0
Updated by Luiz Souza over 4 years ago
- Assignee set to Luiz Souza
- Priority changed from Normal to Very High
Updated by Marc L over 4 years ago
- NIC: Intel X710-DA4 (Quad Port 10Gb)
- pfSense version 2.4.4-p3
- One LAGG group (lagg0) Protocol set to LACP
- 31 VLANs (all lagg0.x)
- Adding more VLANs on-the-fly works for us, though we do it rarely. We used to have the LAGG Protocol set to "NONE" which would consistently produce crashes when adding a new VLAN
- Applying Interface settings is pretty slow but works
- However i did encounter a crash when changing the description of a VLAN (on the "VLANs" tab)
The logs collected from the crash make it look like changing the description and saving caused all VLANs to be taken down and reconfigured. Shouldn't be necessary in the first place, because the description is just webinterface cosmetics, buts that's a different topic.
I followed this procedure twice (edit VLAN -> set new description -> hit save). Was physically present and watched the console. The first time around, the webinterface hung for a while, and the mentioned error was logged to the console a few times, but it succeeded eventually. The second time the webinterface hung again, saw a few of the error messages on the console and ultimately the system crashed and rebooted. Below is the relevant tail end of the collected logs from the crash report.
msgbuf.txt:
ixl0: aq_add_macvlan err -53, aq_error 14 <6>carp: 9@lagg0.101: BACKUP -> INIT (hardware interface down) <6>carp: demoted by 240 to 240 (interface down) <7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3 <7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3 <7>ifa_maintain_loopback_route: deletion failed for interface lagg0.101: 3 <6>carp: demoted by -240 to 0 (vhid removed) <6>lagg0.101: promiscuous mode disabled <6>vlan5: changing name to 'lagg0.101' ixl0: aq_add_macvlan err -53, aq_error 14 ixl0: aq_add_macvlan err -53, aq_error 14 ixl0: aq_add_macvlan err -53, aq_error 14 ixl0: aq_add_macvlan err -53, aq_error 14 <6>lagg0.101: promiscuous mode enabled <6>carp: 9@lagg0.101: INIT -> BACKUP (initialization complete) <6>carp: 19@lagg0.205: BACKUP -> MASTER (master timed out) <6>carp: 10@lagg0.102: BACKUP -> MASTER (master timed out) <6>carp: 13@lagg0.112: BACKUP -> MASTER (master timed out) <6>carp: 34@lagg0.82: BACKUP -> MASTER (master timed out) <6>carp: 8@lagg0.100: BACKUP -> MASTER (master timed out) Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80e38d40 stack pointer = 0x28:0xfffffe04549b48b0 frame pointer = 0x28:0xfffffe04549b48f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0))
version.txt
FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019 root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense
Backtrace
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x898380 curthread = 0xfffff80008438620: pid 12 "swi4: clock (0)" curpcb = 0xfffffe04549b4b80 fpcurthread = none idlethread = 0xfffff800083d5000: tid 100003 "idle: cpu0" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb6810 commontssp = 0xffffffff82bb6810 rsp0 = 0xfffffe04549b4b80 gs32p = 0xffffffff82bbd068 ldt = 0xffffffff82bbd0a8 tss = 0xffffffff82bbd098 db:0:kdb.enter.default> bt Tracing pid 12 tid 100026 td 0xfffff80008438620 carp_master_down_locked() at carp_master_down_locked+0xf0/frame 0xfffffe04549b48f0 carp_master_down() at carp_master_down+0x21/frame 0xfffffe04549b4910 softclock_call_cc() at softclock_call_cc+0x13a/frame 0xfffffe04549b49c0 softclock() at softclock+0x79/frame 0xfffffe04549b49e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe04549b4a20 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe04549b4a70 fork_exit() at fork_exit+0x83/frame 0xfffffe04549b4ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04549b4ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Updated by Steve Wheeler almost 4 years ago
- Status changed from New to Feedback
This particular problem, where adding VLANs generates an error, appears to be solved in the ixl 1.11.9 driver in pfSense 2.4.5.
The 'queue <num> appears to be hung!' issue appears unrelated and should be a separate bug if it persists.
Updated by → luckman212 over 2 years ago
I was just looking at Open issues marked "very high" and this still comes up -- should it be closed?
Updated by Jim Pingle over 2 years ago
→ luckman212 wrote in #note-10:
I was just looking at Open issues marked "very high" and this still comes up -- should it be closed?
It needs re-tested by someone with the affected hardware + problem config.