Bug #15196
openAWS ena interfaces can become unstable/stop responding
0%
Description
On AMD Epyc hardware in AWS, pfSense Plus ena interfaces can lose their IP addressing and then stop responding entirely.
The following log messages are present when this occurs:
Jan 16 18:34:35 np-aws-001 kernel: ena0: <ENA adapter> mem 0x80404000-0x80407fff at device 5.0 on pci0
Jan 16 18:34:35 np-aws-001 kernel: ena0: Elastic Network Adapter (ENA)ena v2.6.2
Jan 16 18:34:35 np-aws-001 kernel: ena0: Unable to allocate LLQ bar resource. LLQ mode won't be used.
Jan 16 18:34:35 np-aws-001 kernel: ena0: ena_com_validate_version() [TID:100000]: ENA device version: 0.10
Jan 16 18:34:35 np-aws-001 kernel: ena0: ena_com_validate_version() [TID:100000]: ENA controller version: 0.0.1 implementation version 1
Jan 16 18:34:35 np-aws-001 kernel: ena0: LLQ is not supported. Fallback to host mode policy.
Jan 16 18:34:35 np-aws-001 kernel: ena0: Ethernet address: 06:ba:32:98:fd:07
Jan 16 18:34:35 np-aws-001 kernel: ena0: [nm] netmap attach
Jan 16 18:34:35 np-aws-001 kernel: ena0: netmap queues/slots: TX 2/1024, RX 2/1024
and
Jan 19 03:49:07 kernel ena0: Found a Tx that wasn't completed on time, qid 1, index 925. 180522704 usecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
Jan 19 03:49:07 kernel ena0: Found a Tx that wasn't completed on time, qid 1, index 924. 179482704 usecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
Jan 19 03:49:07 kernel ena0: Found a Tx that wasn't completed on time, qid 1, index 923. 178472704 usecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
Jan 19 03:48:54 kernel ena0: Found a Tx that wasn't completed on time, qid 1, index 922. 167002704 usecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
Files