Project

General

Profile

Actions

Regression #14217

closed

IPsec Phase 2 rekey failures with some PFS key groups

Added by Georgiy Tyutyunnik about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
IPsec
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
23.05
Release Notes:
Default
Affected Version:
2.7.0
Affected Architecture:

Description

IPSec phase 2 with some specific PFS key groups fails to rekey with the following logs message:

Mar 31 12:47:14 charon 84020 10[IKE] <con1|1> unable to install inbound and outbound IPsec SA (SAD) in kernel
Tunnel initiation establishes phase2 successfully, but the phase2 can't rekey and times out. Only bouncing phase1 brings it back.
Affected PFS key groups: 18,19,23,24,27,28,30,31,32.
Logs for the rekey and end-of-lifetime of the affected phase2 attached


Files

logs2301rekey.txt (3.09 KB) logs2301rekey.txt Georgiy Tyutyunnik, 03/31/2023 10:20 AM
logs2301eolp2.txt (12.5 KB) logs2301eolp2.txt Georgiy Tyutyunnik, 03/31/2023 10:20 AM
before-rekey.png (106 KB) before-rekey.png Jim Pingle, 04/18/2023 12:45 PM
after-rekey.png (110 KB) after-rekey.png Jim Pingle, 04/18/2023 12:45 PM
Actions #1

Updated by Jim Pingle about 1 year ago

Is this tunnel mode or VTI?

Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.

That seems highly unlikely to be a bug with PFS.

Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm

Actions #2

Updated by Georgiy Tyutyunnik about 1 year ago

Jim Pingle wrote in #note-1:

Is this tunnel mode or VTI?

This behavior is on both tunnel and VTI. Logs are for VTI.

Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.

Reverse.

That seems highly unlikely to be a bug with PFS.

Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm

If I use DH 18 on p1 and pfs group 18 on p2 the issue reoccurs
I will review my full list of affected DH groups accordingly

Actions #3

Updated by Steve Wheeler about 1 year ago

  • Status changed from New to Confirmed

Seeing this between a 2.7 VM and 23.01 on a 5100.

Tunnel rekeys as expected using PFS key group 14 but fails after changing only the PFS group to 18.

The only logged change is on the responding tunnel where it shows the proposal and TSs match but then shows:

Mar 31 18:37:55     charon     8413     11[IKE] <con2|1> unable to install inbound and outbound IPsec SA (SAD) in kernel
Mar 31 18:37:55     charon     8413     11[IKE] <con2|1> failed to establish CHILD_SA, keeping IKE_SA 

And then sends 'N(NO_PROP)'

Actions #4

Updated by Steve Wheeler about 1 year ago

  • Project changed from pfSense Plus to pfSense
  • Category changed from IPsec to IPsec
  • Target version set to 2.7.0
  • Affected Plus Version deleted (23.01)
  • Plus Target Version set to 23.05
  • Affected Version set to 2.7.0

When rekeying from 2.7 against 2.6 the remote side accepts the rekeys and installs the new child_SA. But the local side (2.7) then rejects the response and sends a delete with much the same error:

Mar 31 19:45:01     charon     15142     07[IKE] <con3|15> unable to install inbound and outbound IPsec SA (SAD) in kernel
Mar 31 19:45:01     charon     15142     07[IKE] <con3|15> failed to establish CHILD_SA, keeping IKE_SA
Mar 31 19:45:01     charon     15142     07[IKE] <con3|15> reinitiating already active tasks
Mar 31 19:45:01     charon     15142     07[IKE] <con3|15> CHILD_REKEY task
Mar 31 19:45:01     charon     15142     07[IKE] <con3|15> sending DELETE for ESP CHILD_SA with SPI c18bd3fd 

Actions #5

Updated by Steve Wheeler about 1 year ago

Rekeying appears to work OK using PFS group 18 in IKEv1. Though it doesn't remove old rekeyed SAs.

  con3_4: #17, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
    installed 32s ago, rekeying in -31s, expires in 3568s
    in  c2153054,      0 bytes,     0 packets,   561s ago
    out c26bf74e,      0 bytes,     0 packets
    local  192.168.22.0/24|/0
    remote 192.168.180.0/24|/0
  con3_4: #18, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
    installed 29s ago, rekeying in -26s, expires in 3571s
    in  cd589094,      0 bytes,     0 packets,   561s ago
    out ca239133,      0 bytes,     0 packets
    local  192.168.22.0/24|/0
    remote 192.168.180.0/24|/0
  con3_4: #19, reqid 1, INSTALLED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
    installed 24s ago, rekeying in 24s, expires in 3576s
    in  c670c27d,      0 bytes,     0 packets,   561s ago
    out cc439599,      0 bytes,     0 packets
    local  192.168.22.0/24|/0
    remote 192.168.180.0/24|/0

Actions #6

Updated by Marcos M about 1 year ago

  • Private changed from Yes to No
Actions #7

Updated by Jim Pingle about 1 year ago

Something else to consider, the package in FreeBSD gained two options between 22.05.x and 23.01:

     CTR=off: Enable CTR cipher mode wrapper plugin
     KDF=on: Enable KDF (prf+) implementation plugin

Though neither of those seem particularly likely to cause a problem like this.

Might help to dig through the release notes for all of the intermediate versions to see if there is anything relevant. Current snaps have 5.9.10, 22.05 had 5.9.5.

Also be sure to compare the contents and structure of both /var/etc/ipsec/strongswan.conf and /var/etc/ipsec/swanctl.conf looking for differences, and look at the startup logs from both to see if there are any errors at startup or differences in loaded plugins for example.

Actions #8

Updated by Jim Pingle about 1 year ago

  • Subject changed from IPSec phase2 rekey issues on specific PFS key groups to IPsec Phase 2 rekey failures with some PFS key groups

Updating subject for release notes.

Actions #9

Updated by Kristof Provost about 1 year ago

  • Assignee set to Kristof Provost

I've started poking at this, and have been able to reproduce it. It still occurs on strongswan-5.9.10.

I've also determined that the first reported error is in select_and_install(), and is triggered by this->keymat->derive_child_keys() failing, so it's not so much an inability to install (the code doesn't even try to do that), but a failure to derive the child key.
The investigation continues.

Actions #10

Updated by Kristof Provost about 1 year ago

After a bit more digging I found that openssl_kdf.c's set_params() fails to set the salt with EVP_PKEY_CTX_add1_hkdf_info(). That turns out to be because we've wound up with a long shared secret (1088 bytes), which is more than OpenSSL is willing to handle in HKDF. The relevant RFC (5869) does not specify a maximum length for this, and upstream OpenSSL has bumped that limit to 2048 (https://github.com/openssl/openssl/commit/20c2876f24d0ccf9581ace08c7882d544d2588ea).

I'm making a build with that change, which I expect will fix this issue.

Actions #11

Updated by Kristof Provost about 1 year ago

  • Status changed from Confirmed to Ready To Test

Confirmed fixed with that change.

It'll be in the next CE and plus snapshots.

Actions #12

Updated by Jim Pingle about 1 year ago

  • Status changed from Ready To Test to Feedback
Actions #13

Updated by Steve Wheeler about 1 year ago

  • Status changed from Feedback to Resolved

Confirmed this is rekeying as expected using PFS key group 18 in todays CE snnapshot: 2.7.0.a.20230418.0600

Apr 18 18:13:36     charon     86268     12[NET] <con3|1> received packet: from 172.21.16.180[500] to 172.21.16.22[500] (1248 bytes)
Apr 18 18:13:36     charon     86268     12[ENC] <con3|1> parsed CREATE_CHILD_SA response 1934 [ N(ESP_TFC_PAD_N) SA No KE TSi TSr ]
Apr 18 18:13:36     charon     86268     12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED notify
Apr 18 18:13:36     charon     86268     12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> selecting proposal:
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> proposal matches
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> received proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> configured proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> selected proposal: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> selecting traffic selectors for us:
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> config: 192.168.22.0/24|/0, received: 192.168.22.0/24|/0 => match: 192.168.22.0/24|/0
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> selecting traffic selectors for other:
Apr 18 18:13:36     charon     86268     12[CFG] <con3|1> config: 192.168.180.0/24|/0, received: 192.168.180.0/24|/0 => match: 192.168.180.0/24|/0
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> CHILD_SA con3{483} state change: CREATED => INSTALLING
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> using AES_CBC for encryption
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> using HMAC_SHA2_256_128 for integrity
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> adding inbound ESP SA
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> SPI 0xc3cafb1a, src 172.21.16.180 dst 172.21.16.22
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> deleting SAD entry with SPI c3cafb1a
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> deleted SAD entry with SPI c3cafb1a
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> adding SAD entry with SPI c3cafb1a and reqid {1}
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> registering outbound ESP SA
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> policy 192.168.180.0/24|/0 === 192.168.22.0/24|/0 in already exists, increasing refcount
Apr 18 18:13:36     charon     86268     12[IKE] <con3|1> inbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> CHILD_SA con3{483} state change: INSTALLING => INSTALLED
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> adding outbound ESP SA
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> adding SAD entry with SPI c1b8f722 and reqid {1}
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256
Apr 18 18:13:36     charon     86268     12[KNL] <con3|1> policy 192.168.22.0/24|/0 === 192.168.180.0/24|/0 out already exists, increasing refcount
Apr 18 18:13:36     charon     86268     12[IKE] <con3|1> outbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0
Apr 18 18:13:36     charon     86268     12[CHD] <con3|1> CHILD_SA con3{482} state change: REKEYING => REKEYED 
Actions #14

Updated by Jim Pingle about 1 year ago

Also working well here as well. Rekey happened on schedule and traffic continued to pass, where before it had failed:

Actions

Also available in: Atom PDF