Project

General

Profile

Actions

Bug #6296

closed

Interface dies with IPsec and SMP

Added by Chris Buechler over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Very High
Assignee:
-
Category:
Operating System
Target version:
Start date:
05/01/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3
Affected Architecture:
All

Description

When pushing a stream of UDP traffic over IPsec, an issue can be hit that leaves an interface dead chewing 100% of one CPU core looking something like the following.

 12 root       -92    -     0K   688K CPU1    1  14:37 100.00% intr{irq258: igb0:que}

or for re:

12 root       -92    -     0K   272K CPU1    1   0:10  99.46% intr{irq261: re2}

em:

    0 root     -92    -     0K   320K CPU2    2  12:25 100.00% kernel{em1 taskq}

It's not NIC-specific, happens on at least em, igb, and re. Both 32 and 64 bit. It's been confirmed to stop happening if SMP is no longer in play - reduce a VM to one vCPU, or set hint.lapic.X.disabled for all but one core. It never happens on single core physical hardware.

With e1000 NICs, it often logs "watchdog timeout" on that NIC over and over once it gets into that state, but not always. Other NIC types I haven't seen any logs from.

The NIC that ends up dying is the LAN NIC, where the UDP traffic is being initiated. Everything on that NIC stops working. Output traffic stops about a minute or so after input traffic. Whether or not the NIC is completely dead seems to vary. On em and re, it seems to always be completely dead. On igb, there are serious connectivity issues, but it's not completely dead initially at least. Seems like it probably affects only one of the multiple queues in the igb case. Most of the time, it doesn't impact any other NICs on the system.

When it's in that state, tcpdump on the affected NIC shows no inbound traffic, though a span of the switch port shows traffic is being sent to the system. In the igb case, where it's not completely dead but has issues, tcpdump shows some but not all inbound traffic the switch is sending to its port.

It doesn't happen on stock FreeBSD RELENG_10_3 source, nor that plus tryforward. So the issue is somewhere in our changes, but outside of tryforward.

Discussed on these threads, possibly among others.
https://forum.pfsense.org/index.php?topic=110320.0
https://forum.pfsense.org/index.php?topic=110710.0
https://forum.pfsense.org/index.php?topic=110953.0
https://forum.pfsense.org/index.php?topic=110716.0
https://forum.pfsense.org/index.php?topic=110994.0
https://forum.pfsense.org/index.php?topic=110525.0


Files

P1020397.JPG (280 KB) P1020397.JPG "screenshot" of bt in ddb Ludovic Pouzenc, 05/06/2016 05:41 AM
Actions

Also available in: Atom PDF