Regression #14217
closedIPsec Phase 2 rekey failures with some PFS key groups
0%
Description
IPSec phase 2 with some specific PFS key groups fails to rekey with the following logs message:
Mar 31 12:47:14 charon 84020 10[IKE] <con1|1> unable to install inbound and outbound IPsec SA (SAD) in kernel
Tunnel initiation establishes phase2 successfully, but the phase2 can't rekey and times out. Only bouncing phase1 brings it back.
Affected PFS key groups: 18,19,23,24,27,28,30,31,32.
Logs for the rekey and end-of-lifetime of the affected phase2 attached
Files
Updated by Jim Pingle over 2 years ago
Is this tunnel mode or VTI?
Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.
That seems highly unlikely to be a bug with PFS.
Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm
Updated by Georgiy Tyutyunnik over 2 years ago
Jim Pingle wrote in #note-1:
Is this tunnel mode or VTI?
This behavior is on both tunnel and VTI. Logs are for VTI.
Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.
Reverse.
That seems highly unlikely to be a bug with PFS.
Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm
If I use DH 18 on p1 and pfs group 18 on p2 the issue reoccurs
I will review my full list of affected DH groups accordingly
Updated by Steve Wheeler over 2 years ago
- Status changed from New to Confirmed
Seeing this between a 2.7 VM and 23.01 on a 5100.
Tunnel rekeys as expected using PFS key group 14 but fails after changing only the PFS group to 18.
The only logged change is on the responding tunnel where it shows the proposal and TSs match but then shows:
Mar 31 18:37:55 charon 8413 11[IKE] <con2|1> unable to install inbound and outbound IPsec SA (SAD) in kernel Mar 31 18:37:55 charon 8413 11[IKE] <con2|1> failed to establish CHILD_SA, keeping IKE_SA
And then sends 'N(NO_PROP)'
Updated by Steve Wheeler over 2 years ago
- Project changed from pfSense Plus to pfSense
- Category changed from IPsec to IPsec
- Target version set to 2.7.0
- Affected Plus Version deleted (
23.01) - Plus Target Version set to 23.05
- Affected Version set to 2.7.0
When rekeying from 2.7 against 2.6 the remote side accepts the rekeys and installs the new child_SA. But the local side (2.7) then rejects the response and sends a delete with much the same error:
Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> unable to install inbound and outbound IPsec SA (SAD) in kernel Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> failed to establish CHILD_SA, keeping IKE_SA Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> reinitiating already active tasks Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> CHILD_REKEY task Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> sending DELETE for ESP CHILD_SA with SPI c18bd3fd
Updated by Steve Wheeler over 2 years ago
Rekeying appears to work OK using PFS group 18 in IKEv1. Though it doesn't remove old rekeyed SAs.
con3_4: #17, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
installed 32s ago, rekeying in -31s, expires in 3568s
in c2153054, 0 bytes, 0 packets, 561s ago
out c26bf74e, 0 bytes, 0 packets
local 192.168.22.0/24|/0
remote 192.168.180.0/24|/0
con3_4: #18, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
installed 29s ago, rekeying in -26s, expires in 3571s
in cd589094, 0 bytes, 0 packets, 561s ago
out ca239133, 0 bytes, 0 packets
local 192.168.22.0/24|/0
remote 192.168.180.0/24|/0
con3_4: #19, reqid 1, INSTALLED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192
installed 24s ago, rekeying in 24s, expires in 3576s
in c670c27d, 0 bytes, 0 packets, 561s ago
out cc439599, 0 bytes, 0 packets
local 192.168.22.0/24|/0
remote 192.168.180.0/24|/0
Updated by Jim Pingle over 2 years ago
Something else to consider, the package in FreeBSD gained two options between 22.05.x and 23.01:
CTR=off: Enable CTR cipher mode wrapper plugin
KDF=on: Enable KDF (prf+) implementation plugin
Though neither of those seem particularly likely to cause a problem like this.
Might help to dig through the release notes for all of the intermediate versions to see if there is anything relevant. Current snaps have 5.9.10, 22.05 had 5.9.5.
- https://github.com/strongswan/strongswan/releases (5.9.10 back to 5.9.6)
Also be sure to compare the contents and structure of both /var/etc/ipsec/strongswan.conf and /var/etc/ipsec/swanctl.conf looking for differences, and look at the startup logs from both to see if there are any errors at startup or differences in loaded plugins for example.
Updated by Jim Pingle over 2 years ago
- Subject changed from IPSec phase2 rekey issues on specific PFS key groups to IPsec Phase 2 rekey failures with some PFS key groups
Updating subject for release notes.
Updated by Kristof Provost over 2 years ago
- Assignee set to Kristof Provost
I've started poking at this, and have been able to reproduce it. It still occurs on strongswan-5.9.10.
I've also determined that the first reported error is in select_and_install(), and is triggered by this->keymat->derive_child_keys() failing, so it's not so much an inability to install (the code doesn't even try to do that), but a failure to derive the child key.
The investigation continues.
Updated by Kristof Provost over 2 years ago
After a bit more digging I found that openssl_kdf.c's set_params() fails to set the salt with EVP_PKEY_CTX_add1_hkdf_info(). That turns out to be because we've wound up with a long shared secret (1088 bytes), which is more than OpenSSL is willing to handle in HKDF. The relevant RFC (5869) does not specify a maximum length for this, and upstream OpenSSL has bumped that limit to 2048 (https://github.com/openssl/openssl/commit/20c2876f24d0ccf9581ace08c7882d544d2588ea).
I'm making a build with that change, which I expect will fix this issue.
Updated by Kristof Provost over 2 years ago
- Status changed from Confirmed to Ready To Test
Confirmed fixed with that change.
It'll be in the next CE and plus snapshots.
Updated by Jim Pingle over 2 years ago
- Status changed from Ready To Test to Feedback
Updated by Steve Wheeler over 2 years ago
- Status changed from Feedback to Resolved
Confirmed this is rekeying as expected using PFS key group 18 in todays CE snnapshot: 2.7.0.a.20230418.0600
Apr 18 18:13:36 charon 86268 12[NET] <con3|1> received packet: from 172.21.16.180[500] to 172.21.16.22[500] (1248 bytes)
Apr 18 18:13:36 charon 86268 12[ENC] <con3|1> parsed CREATE_CHILD_SA response 1934 [ N(ESP_TFC_PAD_N) SA No KE TSi TSr ]
Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED notify
Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting proposal:
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> proposal matches
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> received proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> configured proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selected proposal: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting traffic selectors for us:
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> config: 192.168.22.0/24|/0, received: 192.168.22.0/24|/0 => match: 192.168.22.0/24|/0
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting traffic selectors for other:
Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> config: 192.168.180.0/24|/0, received: 192.168.180.0/24|/0 => match: 192.168.180.0/24|/0
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{483} state change: CREATED => INSTALLING
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> using AES_CBC for encryption
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> using HMAC_SHA2_256_128 for integrity
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> adding inbound ESP SA
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc3cafb1a, src 172.21.16.180 dst 172.21.16.22
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> deleting SAD entry with SPI c3cafb1a
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> deleted SAD entry with SPI c3cafb1a
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> adding SAD entry with SPI c3cafb1a and reqid {1}
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> registering outbound ESP SA
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> policy 192.168.180.0/24|/0 === 192.168.22.0/24|/0 in already exists, increasing refcount
Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> inbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{483} state change: INSTALLING => INSTALLED
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> adding outbound ESP SA
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> adding SAD entry with SPI c1b8f722 and reqid {1}
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256
Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> policy 192.168.22.0/24|/0 === 192.168.180.0/24|/0 out already exists, increasing refcount
Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> outbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0
Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{482} state change: REKEYING => REKEYED
Updated by Jim Pingle over 2 years ago
- File before-rekey.png before-rekey.png added
- File after-rekey.png after-rekey.png added
Also working well here as well. Rekey happened on schedule and traffic continued to pass, where before it had failed: