Feature #10504
closedMake LACP timeout PDU transmission speed configurable
100%
Description
Could the following option from ifconfig be exposed to the WebUI?
lacp_fast_timeout
Enable lacp fast-timeout on the interface.
-lacp_fast_timeout
Disable lacp fast-timeout on the interface.
Background / use case (see also https://forum.netgate.com/topic/152862/lacp-doesn-t-work-reliably-slow-pdu-transmission-rate-suspected/2):
I have problems with the reliability of the LACP LAG of 2 1Gb interfaces between pfSense and a Juniper switch stack.
FW1 (pfSense primary) is connected to switch 1 and 3 via LACP
FW2 (pfsense secondary) is connected to switch 0 and 2 via LACP
When switch 0 failed earlier I would have expected the other leg connected to switch 2 in the Virtual Chassis to provide resiliency, but that didn't happen. The secondary firewall went active because it couldn't reach the primary firewall anymore via CARP. At the same time FW2 did have some sort of connectivity, since it could take on traffic to some external IP addresses, and cause havoc in the process.
One possible error source I would like to explore is that Juniper tells me that its LACP partner (the FW's) use slow transmissions:
{master:1}
stephan@us1-swi> show lacp interfaces ae2 extensive
Aggregated interface: ae2
LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity
ge-2/0/1 Actor No No Yes Yes Yes Yes Fast Active
ge-2/0/1 Partner No No Yes Yes Yes Yes Slow Active
ge-0/0/1 Actor No No Yes Yes Yes Yes Fast Active
ge-0/0/1 Partner No No Yes Yes Yes Yes Slow Active
LACP protocol: Receive State Transmit State Mux State
ge-2/0/1 Current Slow periodic Collecting distributing
ge-0/0/1 Current Slow periodic Collecting distributing
LACP info: Role System System Port Port Port
priority identifier priority number key
ge-2/0/1 Actor 127 7c:25:86:ce:a4:2f 127 51 3
ge-2/0/1 Partner 32768 ac:1f:6b:66:fb:a2 32768 5 427
ge-0/0/1 Actor 127 7c:25:86:ce:a4:2f 127 1 3
ge-0/0/1 Partner 32768 ac:1f:6b:66:fb:a2 32768 3 427
The explanation for fast and slow timeouts is as follows (from Juniper docs):
Timeout—LACP timeout preference. Periodic transmissions of LACP PDUs occur at either a slow or fast transmission rate, depending upon the expressed LACP timeout preference (Slow Timeout or Fast Timeout). In a fast timeout, PDUs are sent every second and in a slow timeout, PDUs are sent every 30 seconds. LACP timeout occurs when 3 consecutive PDUs are missed. If LACP timeout is a fast timeout, the time taken when 3 consecutive PDUs are missed is 3 seconds (3x1 second). If LACP timeout is a slow timeout, the time taken is 90 seconds( 3x30 seconds).
So this sounds to me like the most likely reason why this fails. And sure enough, other LACPs, eg. to some of my Linux servers, do use Fast:
{master:1}
stephan@us1-swi> show lacp interfaces ae4 extensive
Aggregated interface: ae4
LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity
ge-2/0/4 Actor No No Yes Yes Yes Yes Fast Active
ge-2/0/4 Partner No No Yes Yes Yes Yes Fast Active
ge-0/0/4 FUP Actor No No Yes Yes Yes Yes Fast Active
ge-0/0/4 FUP Partner No No Yes Yes Yes Yes Fast Active
LACP protocol: Receive State Transmit State Mux State
ge-2/0/4 Current Fast periodic Collecting distributing
ge-0/0/4 Current Fast periodic Collecting distributing
LACP info: Role System System Port Port Port
priority identifier priority number key
ge-2/0/4 Actor 127 7c:25:86:ce:a4:2f 127 53 5
ge-2/0/4 Partner 65535 c6:e9:d0:d8:94:79 255 1 9
ge-0/0/4 Actor 127 7c:25:86:ce:a4:2f 127 3 5
ge-0/0/4 Partner 65535 c6:e9:d0:d8:94:79 255 2 9