Project

General

Profile

Feature #10504

Make LACP timeout PDU transmission speed configurable

Added by S E 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Category:
LAGG Interfaces
Target version:
Start date:
04/28/2020
Due date:
% Done:

100%

Estimated time:

Description

Could the following option from ifconfig be exposed to the WebUI?

 lacp_fast_timeout
     Enable lacp fast-timeout on the interface.

 -lacp_fast_timeout
     Disable lacp fast-timeout on the interface.

Background / use case (see also https://forum.netgate.com/topic/152862/lacp-doesn-t-work-reliably-slow-pdu-transmission-rate-suspected/2):

I have problems with the reliability of the LACP LAG of 2 1Gb interfaces between pfSense and a Juniper switch stack.

FW1 (pfSense primary) is connected to switch 1 and 3 via LACP
FW2 (pfsense secondary) is connected to switch 0 and 2 via LACP

When switch 0 failed earlier I would have expected the other leg connected to switch 2 in the Virtual Chassis to provide resiliency, but that didn't happen. The secondary firewall went active because it couldn't reach the primary firewall anymore via CARP. At the same time FW2 did have some sort of connectivity, since it could take on traffic to some external IP addresses, and cause havoc in the process.

One possible error source I would like to explore is that Juniper tells me that its LACP partner (the FW's) use slow transmissions:

{master:1}
stephan@us1-swi> show lacp interfaces ae2 extensive   
Aggregated interface: ae2
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      ge-2/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-2/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
      ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
    LACP protocol:        Receive State  Transmit State          Mux State 
      ge-2/0/1                  Current   Slow periodic Collecting distributing
      ge-0/0/1                  Current   Slow periodic Collecting distributing
    LACP info:        Role     System             System       Port     Port    Port 
                             priority         identifier   priority   number     key 
      ge-2/0/1       Actor        127  7c:25:86:ce:a4:2f        127       51       3
      ge-2/0/1     Partner      32768  ac:1f:6b:66:fb:a2      32768        5     427
      ge-0/0/1       Actor        127  7c:25:86:ce:a4:2f        127        1       3
      ge-0/0/1     Partner      32768  ac:1f:6b:66:fb:a2      32768        3     427

The explanation for fast and slow timeouts is as follows (from Juniper docs):

Timeout—LACP timeout preference. Periodic transmissions of LACP PDUs occur at either a slow or fast transmission rate, depending upon the expressed LACP timeout preference (Slow Timeout or Fast Timeout). In a fast timeout, PDUs are sent every second and in a slow timeout, PDUs are sent every 30 seconds. LACP timeout occurs when 3 consecutive PDUs are missed. If LACP timeout is a fast timeout, the time taken when 3 consecutive PDUs are missed is 3 seconds (3x1 second). If LACP timeout is a slow timeout, the time taken is 90 seconds( 3x30 seconds).

So this sounds to me like the most likely reason why this fails. And sure enough, other LACPs, eg. to some of my Linux servers, do use Fast:

{master:1}
stephan@us1-swi> show lacp interfaces ae4 extensive    
Aggregated interface: ae4
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      ge-2/0/4       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-2/0/4     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-0/0/4 FUP    Actor   No    No   Yes  Yes  Yes   Yes     Fast    Active
      ge-0/0/4 FUP  Partner   No    No   Yes  Yes  Yes   Yes     Fast    Active
    LACP protocol:        Receive State  Transmit State          Mux State 
      ge-2/0/4                  Current   Fast periodic Collecting distributing
      ge-0/0/4                  Current   Fast periodic Collecting distributing
    LACP info:        Role     System             System       Port     Port    Port 
                             priority         identifier   priority   number     key 
      ge-2/0/4       Actor        127  7c:25:86:ce:a4:2f        127       53       5
      ge-2/0/4     Partner      65535  c6:e9:d0:d8:94:79        255        1       9
      ge-0/0/4       Actor        127  7c:25:86:ce:a4:2f        127        3       5
      ge-0/0/4     Partner      65535  c6:e9:d0:d8:94:79        255        2       9

Associated revisions

Revision 06472551 (diff)
Added by Viktor Gurov 6 months ago

Make LACP timeout PDU transmission speed configurable. Issue #10504

Revision a3a04401 (diff)
Added by Viktor Gurov 6 months ago

LAGG proto input validation fix. Issue #10504

History

#2 Updated by Jim Pingle 6 months ago

  • Status changed from New to Pull Request Review

#3 Updated by Renato Botelho 6 months ago

  • Status changed from Pull Request Review to Feedback
  • Assignee set to Renato Botelho
  • Target version set to 2.5.0
  • % Done changed from 0 to 100

PR has been merged. Thanks!

#4 Updated by Viktor Gurov 6 months ago

works fine, but requires extra input validations:
https://github.com/pfsense/pfsense/pull/4300

Cisco 'show lacp neighbor detail' output:

Slow mode:

          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Gi0/0     32768,0cdd.aeed.9607  0x8              12s        SA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x1F2           0x3D

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Long       Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No 

Fast mode:

          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Gi0/0     32768,0cdd.aeed.9607  0x8              17s        FA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x1F2           0x3F

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Short      Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No 

#5 Updated by Jim Pingle 6 months ago

  • Status changed from Feedback to Pull Request Review

#6 Updated by Renato Botelho 6 months ago

  • Status changed from Pull Request Review to Feedback

PR has been merged. Thanks!

#7 Updated by Viktor Gurov 6 months ago

  • Status changed from Feedback to Resolved

works fine on 2.5.0.a.20200506.1402

but I still don't know how to see the current LACP timeout mode,
no any info in sysctl, ifconfig or dmesg
only by checking the neighbor 'show lacp neighbor detail' output (?)

#8 Updated by Jim Pingle 6 months ago

It seems to be indicated by the flags value:

: ifconfig lagg0 lacp_fast_timeout
: ifconfig -vvvv lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=1800000<TXRTLMT>
    ether 00:00:00:00:00:00
    inet6 fe80::290:bff:fe37:a324%lagg0 prefixlen 64 scopeid 0x9 
    laggproto lacp lagghash l2,l3,l4
    lagg options:
        flags=90<LACP_STRICT>
        flowid_shift: 16
    lagg statistics:
        active ports: 0
        flapping: 0
    lag id: [(0000,00-00-00-00-00-00,0000,0000,0000),
         (0000,00-00-00-00-00-00,0000,0000,0000)]
    groups: lagg 
    media: Ethernet autoselect
    status: no carrier
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

: ifconfig lagg0 -lacp_fast_timeout
: ifconfig -vvvv lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=1800000<TXRTLMT>
    ether 00:00:00:00:00:00
    inet6 fe80::290:bff:fe37:a324%lagg0 prefixlen 64 scopeid 0x9 
    laggproto lacp lagghash l2,l3,l4
    lagg options:
        flags=10<LACP_STRICT>
        flowid_shift: 16
    lagg statistics:
        active ports: 0
        flapping: 0
    lag id: [(0000,00-00-00-00-00-00,0000,0000,0000),
         (0000,00-00-00-00-00-00,0000,0000,0000)]
    groups: lagg 
    media: Ethernet autoselect
    status: no carrier
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

90 vs 10. Though ifconfig isn't printing that in a friendly way.

Also available in: Atom PDF