Feature #10504
closedMake LACP timeout PDU transmission speed configurable
100%
Description
Could the following option from ifconfig be exposed to the WebUI?
lacp_fast_timeout
Enable lacp fast-timeout on the interface.
-lacp_fast_timeout
Disable lacp fast-timeout on the interface.
Background / use case (see also https://forum.netgate.com/topic/152862/lacp-doesn-t-work-reliably-slow-pdu-transmission-rate-suspected/2):
I have problems with the reliability of the LACP LAG of 2 1Gb interfaces between pfSense and a Juniper switch stack.
FW1 (pfSense primary) is connected to switch 1 and 3 via LACP
FW2 (pfsense secondary) is connected to switch 0 and 2 via LACP
When switch 0 failed earlier I would have expected the other leg connected to switch 2 in the Virtual Chassis to provide resiliency, but that didn't happen. The secondary firewall went active because it couldn't reach the primary firewall anymore via CARP. At the same time FW2 did have some sort of connectivity, since it could take on traffic to some external IP addresses, and cause havoc in the process.
One possible error source I would like to explore is that Juniper tells me that its LACP partner (the FW's) use slow transmissions:
{master:1}
stephan@us1-swi> show lacp interfaces ae2 extensive
Aggregated interface: ae2
LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity
ge-2/0/1 Actor No No Yes Yes Yes Yes Fast Active
ge-2/0/1 Partner No No Yes Yes Yes Yes Slow Active
ge-0/0/1 Actor No No Yes Yes Yes Yes Fast Active
ge-0/0/1 Partner No No Yes Yes Yes Yes Slow Active
LACP protocol: Receive State Transmit State Mux State
ge-2/0/1 Current Slow periodic Collecting distributing
ge-0/0/1 Current Slow periodic Collecting distributing
LACP info: Role System System Port Port Port
priority identifier priority number key
ge-2/0/1 Actor 127 7c:25:86:ce:a4:2f 127 51 3
ge-2/0/1 Partner 32768 ac:1f:6b:66:fb:a2 32768 5 427
ge-0/0/1 Actor 127 7c:25:86:ce:a4:2f 127 1 3
ge-0/0/1 Partner 32768 ac:1f:6b:66:fb:a2 32768 3 427
The explanation for fast and slow timeouts is as follows (from Juniper docs):
Timeout—LACP timeout preference. Periodic transmissions of LACP PDUs occur at either a slow or fast transmission rate, depending upon the expressed LACP timeout preference (Slow Timeout or Fast Timeout). In a fast timeout, PDUs are sent every second and in a slow timeout, PDUs are sent every 30 seconds. LACP timeout occurs when 3 consecutive PDUs are missed. If LACP timeout is a fast timeout, the time taken when 3 consecutive PDUs are missed is 3 seconds (3x1 second). If LACP timeout is a slow timeout, the time taken is 90 seconds( 3x30 seconds).
So this sounds to me like the most likely reason why this fails. And sure enough, other LACPs, eg. to some of my Linux servers, do use Fast:
{master:1}
stephan@us1-swi> show lacp interfaces ae4 extensive
Aggregated interface: ae4
LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity
ge-2/0/4 Actor No No Yes Yes Yes Yes Fast Active
ge-2/0/4 Partner No No Yes Yes Yes Yes Fast Active
ge-0/0/4 FUP Actor No No Yes Yes Yes Yes Fast Active
ge-0/0/4 FUP Partner No No Yes Yes Yes Yes Fast Active
LACP protocol: Receive State Transmit State Mux State
ge-2/0/4 Current Fast periodic Collecting distributing
ge-0/0/4 Current Fast periodic Collecting distributing
LACP info: Role System System Port Port Port
priority identifier priority number key
ge-2/0/4 Actor 127 7c:25:86:ce:a4:2f 127 53 5
ge-2/0/4 Partner 65535 c6:e9:d0:d8:94:79 255 1 9
ge-0/0/4 Actor 127 7c:25:86:ce:a4:2f 127 3 5
ge-0/0/4 Partner 65535 c6:e9:d0:d8:94:79 255 2 9
Updated by Viktor Gurov over 4 years ago
Updated by Jim Pingle over 4 years ago
- Status changed from New to Pull Request Review
Updated by Renato Botelho over 4 years ago
- Status changed from Pull Request Review to Feedback
- Assignee set to Renato Botelho
- Target version set to 2.5.0
- % Done changed from 0 to 100
PR has been merged. Thanks!
Updated by Viktor Gurov over 4 years ago
works fine, but requires extra input validations:
https://github.com/pfsense/pfsense/pull/4300
Cisco 'show lacp neighbor detail' output:
Slow mode:
Partner Partner Partner Port System ID Port Number Age Flags Gi0/0 32768,0cdd.aeed.9607 0x8 12s SA LACP Partner Partner Partner Port Priority Oper Key Port State 32768 0x1F2 0x3D Port State Flags Decode: Activity: Timeout: Aggregation: Synchronization: Active Long Yes Yes Collecting: Distributing: Defaulted: Expired: Yes Yes No No
Fast mode:
Partner Partner Partner Port System ID Port Number Age Flags Gi0/0 32768,0cdd.aeed.9607 0x8 17s FA LACP Partner Partner Partner Port Priority Oper Key Port State 32768 0x1F2 0x3F Port State Flags Decode: Activity: Timeout: Aggregation: Synchronization: Active Short Yes Yes Collecting: Distributing: Defaulted: Expired: Yes Yes No No
Updated by Jim Pingle over 4 years ago
- Status changed from Feedback to Pull Request Review
Updated by Renato Botelho over 4 years ago
- Status changed from Pull Request Review to Feedback
PR has been merged. Thanks!
Updated by Viktor Gurov over 4 years ago
- Status changed from Feedback to Resolved
works fine on 2.5.0.a.20200506.1402
but I still don't know how to see the current LACP timeout mode,
no any info in sysctl, ifconfig or dmesg
only by checking the neighbor 'show lacp neighbor detail' output (?)
Updated by Jim Pingle over 4 years ago
It seems to be indicated by the flags value:
: ifconfig lagg0 lacp_fast_timeout : ifconfig -vvvv lagg0 lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=1800000<TXRTLMT> ether 00:00:00:00:00:00 inet6 fe80::290:bff:fe37:a324%lagg0 prefixlen 64 scopeid 0x9 laggproto lacp lagghash l2,l3,l4 lagg options: flags=90<LACP_STRICT> flowid_shift: 16 lagg statistics: active ports: 0 flapping: 0 lag id: [(0000,00-00-00-00-00-00,0000,0000,0000), (0000,00-00-00-00-00-00,0000,0000,0000)] groups: lagg media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
: ifconfig lagg0 -lacp_fast_timeout : ifconfig -vvvv lagg0 lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=1800000<TXRTLMT> ether 00:00:00:00:00:00 inet6 fe80::290:bff:fe37:a324%lagg0 prefixlen 64 scopeid 0x9 laggproto lacp lagghash l2,l3,l4 lagg options: flags=10<LACP_STRICT> flowid_shift: 16 lagg statistics: active ports: 0 flapping: 0 lag id: [(0000,00-00-00-00-00-00,0000,0000,0000), (0000,00-00-00-00-00-00,0000,0000,0000)] groups: lagg media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
90 vs 10. Though ifconfig isn't printing that in a friendly way.