Project

General

Profile

Actions

Bug #12071

closed

Responder Only IPsec tunnel tries to connect on secondary node when a failover happens in HA

Added by Marcos Mendoza about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
IPsec
Target version:
-
Start date:
06/22/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

Normally with an IPsec tunnel on a pfSense HA setup, failing over to the secondary makes the IPsec start on the new master, and there is only a single packet loss when testing a continuous ping through the failover window.

If the IPsec P1 is set to responder only due to the remote end being behind NAT, the new master node will get stuck on "connecting" for a while, even though it shouldn't be initiating a connection in the first place.

Additionally, it seems that something has to time out before the remote end is able to re-establish the tunnel with the responder only P1.

Tested on HA setup between 21.05 nodes, and a remote pfSense instance.

Actions #1

Updated by Jim Pingle about 1 month ago

  • Status changed from New to Feedback

I can't reproduce this as stated, at least on 2.5.2. I set the HA pair as responder only and set the far side to always initiate. Made sure there were no "automatic ping" hosts.

If I put the primary into maintenance mode, the tunnel eventually times out and rebuilds on the secondary (though that takes longer than I'd like, that's a different issue). If I take the primary out of maintenance mode, eventually the tunnel moves back to the primary. Disabling CARP is the same. Once the far side realizes the HA pair is not responding, it rebuilds the tunnel.

It's possible this is either not a problem as stated, or due to settings in their configuration. It's also possible that this is fixed by the changes to how initiation/responder work in 2.5.2 with the child SA start action setting.

Another point I noticed is that when saving IPsec on the primary, the secondary didn't appear to automatically apply the IPsec settings. So maybe that also happened here and the secondary wasn't actually set for responder only at the time in its running configuration. If that is the case, that's a bit different problem so would warrant its own dedicated Redmine issue.

Repeat the test again but pay close attention to the contents of /var/etc/ipsec/swanctl.conf on each HA node at each stage, and monitor the logs.

Actions #2

Updated by Jim Pingle about 1 month ago

Since the apply-after-sync thing seems to be its own legitimate issue, I created #12075 for it. If this turns out to be the same root cause, this issue can be closed as a duplicate of #12075 since it has only the information relevant to that situation.

Actions #3

Updated by Marcos Mendoza about 1 month ago

I re-tested this and indeed the issue is the "apply-after-sync" behavior.

Further testing explained the following behavior: "Additionally, it seems that something has to time out before the remote end is able to re-establish the tunnel with the responder only P1."
The reason for this was DPD which needed to finish before it would reconnect.

Actions #4

Updated by Marcos Mendoza about 1 month ago

  • Status changed from Feedback to Closed
Actions #5

Updated by Jim Pingle about 1 month ago

Yes, DPD does have to timeout (which can take several minutes), unfortunately by the time the primary goes into BACKUP mode it's too late for it to terminate the IKE sessions nicely since it no longer can send out packets from the CARP VIP.

We could address that for the manual disable or maintenance mode path potentially but not for "real world" failover events.

Actions

Also available in: Atom PDF