Project

General

Profile

Actions

Bug #8335

open

System hang with LACP downlink to UniFi switch

Added by Mike Pastore about 6 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
LAGG Interfaces
Target version:
-
Start date:
02/16/2018
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.4.2_1
Affected Architecture:
amd64

Description

I have an RCC-VE 2440 (2015) with igb1 and igb2 aggregated into lagg0 and connected to a UniFi switch. UniFi supports aggregated links but only via LACP, and it is not configurable. Per Ubiquiti support, it uses L2 data only in the hashing computation, strict mode should be enabled, and fast timeout should be disabled. I am using this aggregated link as a trunk, with a number of VLANs defined on lagg0 and assigned to different interfaces. lagg0 itself is assigned to LAN to catch untagged traffic.

In this configuration, pfSense hangs at least once every 48 hours. There doesn't seem to be a pattern to the hangs. I see nothing in the system logs when it happens. I'll typically have a `screen` attached to the serial console from another system and I see nothing there, either. A hang is defined as no network traffic passing (and the unit becoming unpingable from the LAN) and the serial console becoming unresponsive. Cycling the power brings it back up.

The only "solution" that I've found so far is to disaggregate the links and go back to a single downlink. I'm running pfSense CE 2.4.2_1 with CoreBoot flashed to V17. Here's a (hopefully comprehensive) list of everything else I tried to solve the problem (one at a time with reboots to apply as necessary):

  • Reinstall pfSense
    • UFS on eMMC
    • ZFS on eMMC
    • ZFS on mSATA
  • Force `ifconfig lagg0 lagghash l2` per Ubiquiti support
  • Disable hardware checksum offloading
    • In GUI only
    • In GUI and `ifconfig <device> -vlanhwcsum -txcsum6`
  • Disable TCP segmentation offloading
    • In GUI and set `net.inet.tcp.tso=0` tunable
    • In GUI, set tunable, and `ifconfig <device> -vlanhwtso`
  • Set `kern.ipc.nmbclusters=1000000` tunable (this is set across all attempts)
  • Add `hw.igb.num_queues=1` to loader.conf.local
  • Add `hw.pci.enable_msix=0` to loader.conf.local
  • Disable crypto (set "Cryptographic Hardware" to "none")
  • Use RAM disk for /var and /tmp
  • Put the router and the switch on a UPS

The following packages are installed:

  • Avahi
  • Netgate_Coreboot_Upgrade
  • Notes
  • nut
  • pfBlockerNG
  • Service_Watchdog
  • sudo
  • System_Patches

The following services are running:

  • avahi
  • dhcpd
  • dnsbl
  • dpinger
  • igmpproxy
  • miniupnpd
  • ntpd
  • radvd
  • sshd
  • syslogd
  • unbound
Actions

Also available in: Atom PDF