Regression #12821
closedIntel e1000 driver (``em``, ``igb``) cannot pass packets tagged with VLAN ``0``
0%
Description
Hello!
There are a few of us that have noticed a possible issue with the igb driver in the latest pfSense releases. I am not technical enough to know exactly what the issue is but I have found what I believe is the resolution. This issue became apparent when the latest release broke a start-up script many of use to authenticate / connect directly to some FTTH ONT's.
For reference here is the GitHub issue for this particular script that contains a lot of useful discussion. https://github.com/MonkWho/pfatt/issues/67
It appears in this release the if_igb.ko driver is simply a shortcut to if_em.ko. This leads me to believe this is the current driver in use? https://www.intel.com/content/www/us/en/download/15187/intel-network-adapter-gigabit-base-driver-for-freebsd.html. I believe there is an issue with this driver preventing something from working properly. I wish I was able to describe the "something" but I can't. Maybe it promiscuous mode issues, issues interacting with Netgraph, VLAN 0, I don't know.
Regardless, compiling and using this (updated?) igb driver fixes this issue. https://www.intel.com/content/www/us/en/download/14610/intel-network-adapter-driver-for-82575-6-and-82580-based-gigabit-network-connections-under-freebsd.html?wapkw=i350%20freebsd
Is it possible we could get this driver included in a future release?
Files
Updated by Hayden Hill almost 3 years ago
Also, some related discussion towards the end of this post https://forum.netgate.com/topic/99190/att-uverse-rg-bypass-0-2-btc/402?_=1644931323812
Updated by Hayden Hill almost 3 years ago
User @lnxsrt over on GitHub may have found the related FreeBSD Bug. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260068
GitHum Comment: https://github.com/MonkWho/pfatt/issues/67#issuecomment-1043433763
Updated by Steve Wheeler almost 3 years ago
- Tracker changed from Bug to Regression
It looks likely that bug would cause this since it requires VLAN 0. That's fixed here but isn't yet in the dev branch:
https://github.com/pfsense/FreeBSD-src/commit/cf101bd5ceebe2b2d229faa949dbf3e146d04382
It works with the updated igb module because that is not the merged e1000 iflib driver. It would be interesting to test Intel's updated merged driver also.
Updated by Hayden Hill almost 3 years ago
Steve Wheeler wrote in #note-3:
It looks likely that bug would cause this since it requires VLAN 0. That's fixed here but isn't yet in the dev branch:
https://github.com/pfsense/FreeBSD-src/commit/cf101bd5ceebe2b2d229faa949dbf3e146d04382It works with the updated igb module because that is not the merged e1000 iflib driver. It would be interesting to test Intel's updated merged driver also.
Thanks for the additional information! I am assuming this is the merged driver you are referring to? https://www.intel.com/content/www/us/en/download/15187/intel-network-adapter-gigabit-base-driver-for-freebsd.html
If so, I would be willing to compile it and give it a shot if it would be of use?
Updated by Kris Phillips almost 3 years ago
I can confirm the iflib driver issue as well. I may spin up a FreeBSD 12.3 install to compile the newer driver as well for an additional point of testing.
Updated by Kris Phillips almost 3 years ago
I have compiled the igb driver for 12.3 to test this weekend.
Additionally, patches for the VLAN issue should be in 2.7 and 22.05 DEVEL releases, by my understanding, so we should test the latest dev snapshots.
Updated by Kris Phillips over 2 years ago
Tested the igb driver. Issue is no longer present in 22.01 or 2.6 with the custom driver compiled from kernel source here:
Added the driver to /boot/modules and added the following to /boot/loader.conf.local before completing the upgrade from 21.05.2 with a working environment to 22.01:
if_igb_load="YES"
if_igb_name="/boot/modules/if_igb.ko"
Attached is the compiled driver.
Next I need to test 2.7 and 22.05 DEVEL as the driver supposedly includes patches from upstream in FreeBSD 12.3 on the standard em driver that supposedly resolves this.
Updated by Kris Phillips over 2 years ago
Tested pfatt on 22.05 April 29th build and getting the following crash report:
Fatal error: Uncaught Error: Call to undefined function pfSense_ngctl_attach() in Command line code:1
Stack trace:
#0 {main}
thrown in Command line code on line 1
Updated by Kris Phillips over 2 years ago
Command I was trying to run manually after I noticed it failing:
/usr/local/bin/php -r "pfSense_ngctl_attach('.', 'igb0');"
Updated by Hayden Hill over 2 years ago
Kris Phillips wrote in #note-9:
Command I was trying to run manually after I noticed it failing:
/usr/local/bin/php -r "pfSense_ngctl_attach('.', 'igb0');"
Okay. I gave it a shot tonight. 22.05. I did not see the same issue you ran into. However, I still had to use the custom driver. Unfortunately a new issue cropped up. IPv6 gateway never leaves "pending" state, but IPv6 traffic does flow.
Updated by Kris Phillips over 2 years ago
Hayden Hill wrote in #note-10:
Kris Phillips wrote in #note-9:
Command I was trying to run manually after I noticed it failing:
/usr/local/bin/php -r "pfSense_ngctl_attach('.', 'igb0');"
Okay. I gave it a shot tonight. 22.05. I did not see the same issue you ran into. However, I still had to use the custom driver. Unfortunately a new issue cropped up. IPv6 gateway never leaves "pending" state, but IPv6 traffic does flow.
Hello Hayden,
Likely the build you're running is older and doesn't have the changes that breaks this. I'm running an internal-only build from April 29th that is not public.
I have discussed with the dev team and apparently the pfSense_ngctl_attach() and pfSense_ngctl_detach() php modules were removed as they are no longer needed. All interfaces are already there in netgraph, so the script will likely need to remove these commands before 22.05 and 2.7 are released.
Updated by Hayden Hill over 2 years ago
Kris Phillips wrote in #note-11:
Hayden Hill wrote in #note-10:
Kris Phillips wrote in #note-9:
Command I was trying to run manually after I noticed it failing:
/usr/local/bin/php -r "pfSense_ngctl_attach('.', 'igb0');"
Okay. I gave it a shot tonight. 22.05. I did not see the same issue you ran into. However, I still had to use the custom driver. Unfortunately a new issue cropped up. IPv6 gateway never leaves "pending" state, but IPv6 traffic does flow.
Hello Hayden,
Likely the build you're running is older and doesn't have the changes that breaks this. I'm running an internal-only build from April 29th that is not public.
I have discussed with the dev team and apparently the pfSense_ngctl_attach() and pfSense_ngctl_detach() php modules were removed as they are no longer needed. All interfaces are already there in netgraph, so the script will likely need to remove these commands before 22.05 and 2.7 are released.
Ah! Couple of things. I use the supplicant mode so those lines are not in / applicable to my use case anyways. Did you have any issues with the IPv6 gateway monitor like I did? Also, I would love to help troubleshoot this so if whoever you are working with is willing to share with the class. I would be happy to try out some builds.
Updated by Kris Phillips over 2 years ago
Fix to the script here resolves the ngeth interface issue since they are already part of netgraph:
https://github.com/MonkWho/pfatt/pull/73/commits/015a8cf91340699fc18f1f54eeef3160e8c2ee9e
About to test and ensure DHCP is working with in-built Intel driver to see if a custom driver is still needed in CE 2.7 and 22.05.
Updated by Kris Phillips over 2 years ago
Unfortunately, it seems that with the May 6th build of 22.05 netgraph is still broken for VLAN0 tagged DHCP traffic. The 802.1X traffic is passing, but for some reason DHCP is not still. Likely a custom driver is still needed here until a fix can be found.
Oddly, when using a netgraph ngeth# interface, I also get "Configuring WAN" and "Configuring LAN" twice and the interface mismatch shows up every time, even though I'm using an earlyshellcmd script. Here is the boot log:
Starting DHCPv6 service...done.
Setting up gateway monitors...done.
Writing configuration...........................done.
One moment while the settings are reloading... done!
..Configuring loopback interface...done.
Configuring LAN interface...done.
Configuring WAN interface...done.
Checking config backups consistency...done.
Setting up extended sysctls...done.
Setting timezone...done.
Configuring loopback interface...done.
Starting syslog...done.
Setting up interfaces microcode...done.
Removed leftover dhcp6c lock file: /tmp/dhcp6c_lock
Configuring loopback interface...done.
Configuring LAN interface...done.
Configuring WAN interface...
Updated by Kris Phillips over 2 years ago
Tested this on igc interfaces and it appears this only affects e1000-based NICs. Other Intel NICs would seem to be fine. Looks like the transition to iflib for the e1000 igb and em driver likely caused this.
Updated by Kris Phillips over 2 years ago
Tested ix interfaces as well. They are not subject to this bug. Based on the fact that Broadcom NICs and Intel ix/igc NICs are not affected, it seems this only applied to igb e1000 NICs, as suspected.
Updated by Steve Wheeler over 2 years ago
- Subject changed from Intel igb(4) NIC Driver - pfSense 22.01 and 2.6.0 to Intel e1000 driver (em & igb) cannot pass VLAN0 tagged packets
- Status changed from New to Confirmed
- Target version set to CE-Next
- Plus Target Version set to Plus-Next
Updated by Kris Phillips over 2 years ago
FYI using the manually compiled, out-of-band driver still works fine on 22.05-RELEASE (as expected since the FreeBSD version didn't change).
Updated by Hayden Hill over 2 years ago
Hey! Any chance there is an update on this? Would love to stop using the custom driver on the next release.
Updated by Kris Phillips over 2 years ago
Hayden Hill wrote in #note-19:
Hey! Any chance there is an update on this? Would love to stop using the custom driver on the next release.
Hello Hayden,
Nothing at this time, but our development team is aware of it and working on it still.
Updated by Steve Wheeler about 2 years ago
This appears to be specifically the VLAN Hardware Offloading in e1000 NICs which drops VLAN0 tagged packets.
Disabling that allows priority tagged packets to pass.
Tested in 2.7.0.a.20220922.1830
Updated by Hayden Hill about 2 years ago
Steve Wheeler wrote in #note-21:
This appears to be specifically the VLAN Hardware Offloading in e1000 NICs which drops VLAN0 tagged packets.
Disabling that allows priority tagged packets to pass.
Tested in 2.7.0.a.20220922.1830
Apologies for the dumb question but where is the setting / what is the command to accomplish that? I thought this was solved in FreeBSD-13 and beyond? Is your suggested solution a workaround for the current releases?
Updated by Kris Phillips about 2 years ago
Steve Wheeler wrote in #note-21:
This appears to be specifically the VLAN Hardware Offloading in e1000 NICs which drops VLAN0 tagged packets.
Disabling that allows priority tagged packets to pass.
Tested in 2.7.0.a.20220922.1830
Tested this theory. DHCP traffic still doesn't pass across VLAN0 without the custom driver with the -vlanhwfilter directive on all interfaces involved in the pfatt.sh script.
Here is my interfaces showing the capabilities versus active options:
igb0: flags=28963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
options=4e427bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
capabilities=4f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
ether 80:61:5f:0c:85:36
inet6 fe80::8261:5fff:fe0c:8536%igb0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
supported media:
media autoselect
media 1000baseT
media 1000baseT mediaopt full-duplex
media 100baseTX mediaopt full-duplex
media 100baseTX
media 10baseT/UTP mediaopt full-duplex
media 10baseT/UTP
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
igb1: flags=28963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
options=4e427bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
capabilities=4f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
ether 80:61:5f:0c:85:37
inet6 fe80::8261:5fff:fe0c:8537%igb1 prefixlen 64 scopeid 0x2
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
supported media:
media autoselect
media 1000baseT
media 1000baseT mediaopt full-duplex
media 100baseTX mediaopt full-duplex
media 100baseTX
media 10baseT/UTP mediaopt full-duplex
media 10baseT/UTP
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
I added the following to my script to disable the vlan hw filter on boot:
echo "Disabling VLAN HW Filter for all interfaces involved"
/sbin/ifconfig $ONT_IF -vlanhwfilter
/sbin/ifconfig $RG_IF -vlanhwfilter
echo "OK!"
After several reboots, I still don't see DHCP responses coming in on the WAN interface or traffic flowing normally. Re-attaching with the same script on my other test box immediately brought the link up with the same settings, minus the -vlanhwfilter directive and interface name differences.
Tested on Sept 19th build of 22.11.
Updated by Steve Wheeler about 2 years ago
As Kris wrote, there's no GUI switch for that. Yet. So you have to disable it at the command line. For example:
ifconfig em0 -vlanhwfilter
Updated by Hayden Hill about 2 years ago
Steve Wheeler wrote in #note-24:
As Kris wrote, there's no GUI switch for that. Yet. So you have to disable it at the command line. For example:
[...]
Ah okay! for what it is worth. I am on 22.05 with the custom driver and I have not had to use the vlanhwfilter flag in my script. If I use the stock driver it fails. Maybe there is different behavior between the em0 and igb0 NIC's?
Updated by Kris Phillips about 2 years ago
Did some additional testing today. Ran a pcap in promisc mode. The Netgraph interface and physical interface attached to the fiber ONT see SYN messages being sent to my firewall with my IP address from stuff on the internet. However, since my firewall has an IP of 0.0.0.0 no response is made, so something with DHCP is not happy. I also see DHCPDISCOVER messages in the pcap, so there appears to be attempts for DHCP messages to go out, but for some reason there is no DHCPOFFER when using the in-built Intel e1000 driver like the ATT-side of things never gets it. It's almost as if the packet only "lives" in the firewall and never actually gets transmitted on the wire. The DHCPDISCOVERs show on both the physical interface packet capture and the netgraph interface's capture.
Updated by Kris Phillips about 2 years ago
Here is my test script in the event that disabling vlanhwfilter becomes necessary for next release instead.
Updated by Kris Phillips about 2 years ago
With the introduction of native PCP VLAN0 tagging in pfSense Plus 23.01 and the new bridge filtering to pass along EAP traffic from another interface natively integrated into pfSense Plus, this will likely no longer be of a high concern, but we should still figure out why netgraph is doing this on these 1000BASE-T Intel cards.
Updated by Kris Phillips almost 2 years ago
FYI it appears this issue has spread in 23.01 to the igc driver as well. After upgrading to 23.01 on a Netgate 4100, I an seeing the same DHCP traffic not being passed in netgraph issue as was present with igb on previous versions. I will have to test ix interfaces, but this problem might have spread there too.
Updated by Hayden Hill almost 2 years ago
Kris Phillips wrote in #note-29:
FYI it appears this issue has spread in 23.01 to the igc driver as well. After upgrading to 23.01 on a Netgate 4100, I an seeing the same DHCP traffic not being passed in netgraph issue as was present with igb on previous versions. I will have to test ix interfaces, but this problem might have spread there too.
Does the same -vlanhwfilter flag fix it for the igc driver? Or do you have to compile a custom driver?
Updated by Kris Phillips almost 2 years ago
Hayden Hill wrote in #note-30:
Kris Phillips wrote in #note-29:
FYI it appears this issue has spread in 23.01 to the igc driver as well. After upgrading to 23.01 on a Netgate 4100, I an seeing the same DHCP traffic not being passed in netgraph issue as was present with igb on previous versions. I will have to test ix interfaces, but this problem might have spread there too.
Does the same -vlanhwfilter flag fix it for the igc driver? Or do you have to compile a custom driver?
The -vlanfwfilter option never fixed anything on igb, so I'm assuming it won't on igc either. Will need to check if there is a driver from Intel that can be compiled for igc.
Updated by Hayden Hill almost 2 years ago
Kris Phillips wrote in #note-31:
Hayden Hill wrote in #note-30:
Kris Phillips wrote in #note-29:
FYI it appears this issue has spread in 23.01 to the igc driver as well. After upgrading to 23.01 on a Netgate 4100, I an seeing the same DHCP traffic not being passed in netgraph issue as was present with igb on previous versions. I will have to test ix interfaces, but this problem might have spread there too.
Does the same -vlanhwfilter flag fix it for the igc driver? Or do you have to compile a custom driver?
The -vlanfwfilter option never fixed anything on igb, so I'm assuming it won't on igc either. Will need to check if there is a driver from Intel that can be compiled for igc.
ah .. I wasn't paying attention. When I upgraded to 23.01 things stopped working until I added -vlanhwfilter. I assumed that was because it removed my custom driver. Apparently my customer driver was maintained through the upgrade. First, is that expected behavior? Second, does anyone have a compiled igb driver based on 14.0-CURRENT? Mine seems to be holding even though I compiled it on 12.3-STABLE.
Updated by Steve Wheeler almost 2 years ago
For clarity the e1000 iflib driver that is in-kernel in pfSense has a bug that prevents it passing vlan0 if vlan hardware filtering is enabled. That can be disabled using `-vlanhwfilter` and it will then accept vlan0 tagged packets. Nothing has changed there between 22.05 and 23.01. Nothing has changed in igc either.
However the pfatt Netgraph script still fails for e1000 and now igc fails in 23.01/2.7. The failure there is more then simply failing to pass vlan0 as was initially thought.
Additionally a Netgraph script is no longer required for other ISPs using priority tagged dhcp packets in pfSense 23.01/2.7.
Updated by Kris Phillips over 1 year ago
I can confirm this issue is still present in 23.05. When testing the Ethernet filtering rules in 23.05, it was necessary to run "ifconfig igb# -vlanhwfilter" to allow the Layer 2 pf filtering to function properly. Once this was done, everything worked as expected.
Updated by Kristof Provost over 1 year ago
- Status changed from Confirmed to Waiting on Merge
This will be fixed with https://cgit.freebsd.org/src/commit/?id=0229fab2fe0eed843ebec98fd31b7d49bb2e8438
Updated by Jim Pingle over 1 year ago
- Assignee set to Kristof Provost
- Target version changed from CE-Next to 2.7.0
- Plus Target Version changed from Plus-Next to 23.09
Updated by Steve Wheeler over 1 year ago
- Status changed from Waiting on Merge to Feedback
This is now in 23.05-RC
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 23.09 to 23.05
Updated by Jim Pingle over 1 year ago
- Subject changed from Intel e1000 driver (em & igb) cannot pass VLAN0 tagged packets to Intel e1000 driver (``em``, ``igb``) cannot pass packets tagged with VLAN ``0``
Updating subject for release notes.
Updated by Hayden Hill over 1 year ago
Updated by Steve Wheeler over 1 year ago
- Status changed from Feedback to Resolved
Works as expected in current 23.05 snapshots:
[23.05-RC][admin@m370.stevew.lan]/root: ifconfig igb2 igb2: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: OPT1 options=4e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether 00:01:21:01:aa:7d inet6 fe80::201:21ff:fe01:aa7d%igb2 prefixlen 64 scopeid 0x3 inet 10.13.0.10 netmask 0xffffff00 broadcast 10.13.0.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> [23.05-RC][admin@m370.stevew.lan]/root: tcpdump -nei igb2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on igb2, link-type EN10MB (Ethernet), capture size 262144 bytes 22:42:40.860487 00:01:21:01:aa:7d > 90:ec:77:1f:8c:40, ethertype IPv4 (0x0800), length 43: 10.13.0.10 > 10.13.0.3: ICMP echo request, id 58695, seq 4369, length 9 22:42:40.860632 90:ec:77:1f:8c:40 > 00:01:21:01:aa:7d, ethertype 802.1Q (0x8100), length 60: vlan 0, p 2, ethertype IPv4, 10.13.0.3 > 10.13.0.10: ICMP echo reply, id 58695, seq 4369, length 9 22:42:41.391474 00:01:21:01:aa:7d > 90:ec:77:1f:8c:40, ethertype IPv4 (0x0800), length 43: 10.13.0.10 > 10.13.0.3: ICMP echo request, id 58695, seq 4370, length 9 22:42:41.391620 90:ec:77:1f:8c:40 > 00:01:21:01:aa:7d, ethertype 802.1Q (0x8100), length 60: vlan 0, p 2, ethertype IPv4, 10.13.0.3 > 10.13.0.10: ICMP echo reply, id 58695, seq 4370, length 9 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel
Tested: pfSense-23.05.r.20230515.2213
Updated by Kris Phillips over 1 year ago
Tested on 23.05 with my ATT Fiber connection and VLAN0 PCP tagging. No issues.