Project

General

Profile

Actions

Regression #13381

closed

Software VLAN tagging does not work on ``ixgbe(4)`` interfaces

Added by Steve Wheeler over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Hardware / Drivers
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
23.01
Release Notes:
Default
Affected Version:
2.7.0
Affected Architecture:
amd64

Description

VLAN tagged traffic fails on an ix NIC if hardware vlan tagging is disabled.
For example:

[22.05-RELEASE][admin@4100.stevew.lan]/root: ping 10.101.0.12
PING 10.101.0.12 (10.101.0.12): 56 data bytes
64 bytes from 10.101.0.12: icmp_seq=0 ttl=64 time=0.435 ms
64 bytes from 10.101.0.12: icmp_seq=1 ttl=64 time=0.351 ms
64 bytes from 10.101.0.12: icmp_seq=2 ttl=64 time=0.359 ms
64 bytes from 10.101.0.12: icmp_seq=3 ttl=64 time=0.378 ms
^C
--- 10.101.0.12 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.351/0.381/0.435/0.033 ms
[22.05-RELEASE][admin@4100.stevew.lan]/root: ifconfig ix3
ix3: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
    description: WAN
    options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
    ether 90:ec:77:1f:8a:5f
    inet6 fe80::92ec:77ff:fe1f:8a5f%ix3 prefixlen 64 scopeid 0x8
    inet 172.21.16.232 netmask 0xffffff00 broadcast 172.21.16.255
    inet 45.65.87.21 netmask 0xffffffc0 broadcast 45.65.87.63 vhid 1
    carp: MASTER vhid 1 advbase 1 advskew 0
    media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
[22.05-RELEASE][admin@4100.stevew.lan]/root: ifconfig ix3 -vlanhwtag
[22.05-RELEASE][admin@4100.stevew.lan]/root: ifconfig ix3
ix3: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
    description: WAN
    options=8138a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
    ether 90:ec:77:1f:8a:5f
    inet6 fe80::92ec:77ff:fe1f:8a5f%ix3 prefixlen 64 scopeid 0x8
    inet 172.21.16.232 netmask 0xffffff00 broadcast 172.21.16.255
    inet 45.65.87.21 netmask 0xffffffc0 broadcast 45.65.87.63 vhid 1
    carp: BACKUP vhid 1 advbase 1 advskew 0
    media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
[22.05-RELEASE][admin@4100.stevew.lan]/root: ping 10.101.0.12
PING 10.101.0.12 (10.101.0.12): 56 data bytes
^C
--- 10.101.0.12 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss

VLAN hardware tagging is enabled by default so this is not easy to hit.

It produces some unexpected behaviour. In a packet capture on the parent interface there is no outbound VLAN traffic show at all.
Inbound VLAN traffic appears as double tagged with VLAN0 as the outer tag:

20:30:18.349238 00:51:82:11:22:02 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 68: vlan 0, p 0, ethertype 802.1Q, vlan 1001, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.101.0.1 tell 10.101.0.12, length 46

VLAN0 is expected to be dropped.

This behaviour appears to have been introduced by this commit:
https://github.com/pfsense/FreeBSD-src/commit/9c762cc125c0c2dae9fbf49cc526bb97c14b54a4
All snapshots after 20220314-1916 exhibiting it.

Tested on a 4100 on 22.05 and in 2.7. The user who hit this initially is also using a C3K SoC device with the same on-board NICs.

See: https://forum.netgate.com/topic/173149/pfsense-22-05-breaks-vlans-restoring-pfsense-22-01-fixes-the-issue

Actions #1

Updated by Steve Wheeler over 2 years ago

It looks like this issue still happens in FreeBSD Head. Though unlike in pfSense (FreeBSD 12) we can see outbound traffic in packet captures. Replies still come back unexpectedlu double tagged though:

17:29:30.370205 90:ec:77:1f:8a:5f > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1001, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.101.0.12 tell 10.101.0.1, length 28
17:29:30.370787 00:51:82:11:22:02 > 90:ec:77:1f:8a:5f, ethertype 802.1Q (0x8100), length 64: vlan 0, p 0, ethertype 802.1Q, vlan 1001, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.101.0.12 is-at 00:51:82:11:22:02, length 42

Hence packets are still dropped and the connection fails.

Actions #2

Updated by Steve Wheeler over 2 years ago

Tested: FreeBSD-14.0-CURRENT-amd64-20220729-467d3e2e8aa-257025-memstick.img

Actions #3

Updated by Kristof Provost over 2 years ago

I've been able to reproduce this (on pfsense/main).

That required the following:

ifconfig vlan create vlandev ix3 vlan 42
ifconfig vlan0 192.168.42.1/24 up
ifconfig ix3 -vlanhwtag

The traffic sent out through the interface it fine, but received traffic is incorrectly double-tagged (once with vlan 0, then with vlan 42. i.e. the outer tag is 0, the inner is 42).

Reverting the listed patch (9c762cc125c0c2dae9fbf49cc526bb97c14b54a4) fixes the problem.

Interestingly the problem does not occur if the vlanhwtag feature is disabled before the vlan is created. I believe that to be an important clue.

Actions #4

Updated by Kristof Provost over 2 years ago

I proposed a patch in https://reviews.freebsd.org/D36139
It works for me, but I'd like the Intel people (and driver maintainers) to take a look before I commit it.

Actions #5

Updated by Steve Wheeler about 2 years ago

  • Status changed from New to Waiting on Merge
Actions #6

Updated by Jim Pingle about 2 years ago

  • Plus Target Version changed from 22.11 to 23.01
Actions #7

Updated by Steve Wheeler about 2 years ago

  • Status changed from Waiting on Merge to Resolved

This fix is now merged into 23.01 and works in current snapshots:

[23.01-DEVELOPMENT][admin@4100.stevew.lan]/root: ping 10.101.0.10
PING 10.101.0.10 (10.101.0.10): 56 data bytes
64 bytes from 10.101.0.10: icmp_seq=0 ttl=64 time=0.412 ms
64 bytes from 10.101.0.10: icmp_seq=1 ttl=64 time=0.271 ms
^C
--- 10.101.0.10 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.271/0.341/0.412/0.070 ms
[23.01-DEVELOPMENT][admin@4100.stevew.lan]/root: ifconfig ix3.1001
ix3.1001: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: VLAN1001
    options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
    ether 90:ec:77:1f:8a:5f
    inet6 fe80::92ec:77ff:fe1f:8a5f%ix3.1001 prefixlen 64 scopeid 0xe
    inet 10.101.0.1 netmask 0xffffff00 broadcast 10.101.0.255
    groups: vlan
    vlan: 1001 vlanproto: 802.1q vlanpcp: 0 parent interface: ix3
    media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
[23.01-DEVELOPMENT][admin@4100.stevew.lan]/root: ifconfig ix3
ix3: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=4e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
    ether 90:ec:77:1f:8a:5f
    inet6 fe80::92ec:77ff:fe1f:8a5f%ix3 prefixlen 64 scopeid 0x8
    inet 172.21.16.232 netmask 0xffffff00 broadcast 172.21.16.255
    media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
[23.01-DEVELOPMENT][admin@4100.stevew.lan]/root: ifconfig ix3 -vlanhwtag
[23.01-DEVELOPMENT][admin@4100.stevew.lan]/root: ping 10.101.0.10
PING 10.101.0.10 (10.101.0.10): 56 data bytes
64 bytes from 10.101.0.10: icmp_seq=0 ttl=64 time=0.421 ms
64 bytes from 10.101.0.10: icmp_seq=1 ttl=64 time=0.336 ms
64 bytes from 10.101.0.10: icmp_seq=2 ttl=64 time=0.344 ms
^C
--- 10.101.0.10 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.336/0.367/0.421/0.038 ms

Actions #8

Updated by Jim Pingle almost 2 years ago

  • Subject changed from Software vlan tagging is broken in ixgbe to Software VLAN tagging does not work on ``ixgbe(4)`` interfaces

Updating subject for release notes.

Actions #9

Updated by Nicolas Embriz over 1 year ago

Steve Wheeler wrote:

VLAN tagged traffic fails on an ix NIC if hardware vlan tagging is disabled.
For example:
[...]

VLAN hardware tagging is enabled by default so this is not easy to hit.

It produces some unexpected behaviour. In a packet capture on the parent interface there is no outbound VLAN traffic show at all.
Inbound VLAN traffic appears as double tagged with VLAN0 as the outer tag:
[...]

VLAN0 is expected to be dropped.

This behaviour appears to have been introduced by this commit:
https://github.com/pfsense/FreeBSD-src/commit/9c762cc125c0c2dae9fbf49cc526bb97c14b54a4
All snapshots after 20220314-1916 exhibiting it.

Tested on a 4100 on 22.05 and in 2.7. The user who hit this initially is also using a C3K SoC device with the same on-board NICs.

See: https://forum.netgate.com/topic/173149/pfsense-22-05-breaks-vlans-restoring-pfsense-22-01-fixes-the-issue

Hi, I am having a similar problem with ixl interfaces, In my case, I can't reach any device in the VLAN only the router (pfsense):

ixl1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: LAN
        options=a100b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6>
        ether 64:9d:99:b1:80:03
        inet6 fe80::669d:99ff:feb1:8003%ixl1 prefixlen 64 scopeid 0x2
        inet6 2a0e:97c0:620:affe::1 prefixlen 64
        inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail)

ixl1.20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: bhyve
        options=200001<RXCSUM,RXCSUM_IPV6>
        ether 64:9d:99:b1:80:03
        inet6 fe80::669d:99ff:feb1:8003%ixl1.20 prefixlen 64 scopeid 0x11
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
        groups: vlan
        vlan: 20 vlanpcp: 0 parent interface: ixl1
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Actions

Also available in: Atom PDF