Regression #14217
closedIPsec Phase 2 rekey failures with some PFS key groups
0%
Description
IPSec phase 2 with some specific PFS key groups fails to rekey with the following logs message:
Mar 31 12:47:14 charon 84020 10[IKE] <con1|1> unable to install inbound and outbound IPsec SA (SAD) in kernel
Tunnel initiation establishes phase2 successfully, but the phase2 can't rekey and times out. Only bouncing phase1 brings it back.
Affected PFS key groups: 18,19,23,24,27,28,30,31,32.
Logs for the rekey and end-of-lifetime of the affected phase2 attached
Files
Updated by Jim Pingle almost 2 years ago
Is this tunnel mode or VTI?
Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.
That seems highly unlikely to be a bug with PFS.
Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm
Updated by Georgiy Tyutyunnik almost 2 years ago
Jim Pingle wrote in #note-1:
Is this tunnel mode or VTI?
This behavior is on both tunnel and VTI. Logs are for VTI.
Are those logs in forward or reverse order? They seem to be in reverse which makes reading them in logical time order confusing.
Reverse.
That seems highly unlikely to be a bug with PFS.
Also note that when establishing a tunnel it uses the DH group from P1 but when rekeying it uses the settings for P2, so a mismatch between both sides might only become apparent on rekey: https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html#phase-1-proposal-encryption-algorithm
If I use DH 18 on p1 and pfs group 18 on p2 the issue reoccurs
I will review my full list of affected DH groups accordingly
Updated by Steve Wheeler almost 2 years ago
- Status changed from New to Confirmed
Seeing this between a 2.7 VM and 23.01 on a 5100.
Tunnel rekeys as expected using PFS key group 14 but fails after changing only the PFS group to 18.
The only logged change is on the responding tunnel where it shows the proposal and TSs match but then shows:
Mar 31 18:37:55 charon 8413 11[IKE] <con2|1> unable to install inbound and outbound IPsec SA (SAD) in kernel Mar 31 18:37:55 charon 8413 11[IKE] <con2|1> failed to establish CHILD_SA, keeping IKE_SA
And then sends 'N(NO_PROP)'
Updated by Steve Wheeler almost 2 years ago
- Project changed from pfSense Plus to pfSense
- Category changed from IPsec to IPsec
- Target version set to 2.7.0
- Affected Plus Version deleted (
23.01) - Plus Target Version set to 23.05
- Affected Version set to 2.7.0
When rekeying from 2.7 against 2.6 the remote side accepts the rekeys and installs the new child_SA. But the local side (2.7) then rejects the response and sends a delete with much the same error:
Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> unable to install inbound and outbound IPsec SA (SAD) in kernel Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> failed to establish CHILD_SA, keeping IKE_SA Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> reinitiating already active tasks Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> CHILD_REKEY task Mar 31 19:45:01 charon 15142 07[IKE] <con3|15> sending DELETE for ESP CHILD_SA with SPI c18bd3fd
Updated by Steve Wheeler almost 2 years ago
Rekeying appears to work OK using PFS group 18 in IKEv1. Though it doesn't remove old rekeyed SAs.
con3_4: #17, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192 installed 32s ago, rekeying in -31s, expires in 3568s in c2153054, 0 bytes, 0 packets, 561s ago out c26bf74e, 0 bytes, 0 packets local 192.168.22.0/24|/0 remote 192.168.180.0/24|/0 con3_4: #18, reqid 1, REKEYED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192 installed 29s ago, rekeying in -26s, expires in 3571s in cd589094, 0 bytes, 0 packets, 561s ago out ca239133, 0 bytes, 0 packets local 192.168.22.0/24|/0 remote 192.168.180.0/24|/0 con3_4: #19, reqid 1, INSTALLED, TUNNEL, ESP:AES_GCM_16-128/MODP_8192 installed 24s ago, rekeying in 24s, expires in 3576s in c670c27d, 0 bytes, 0 packets, 561s ago out cc439599, 0 bytes, 0 packets local 192.168.22.0/24|/0 remote 192.168.180.0/24|/0
Updated by Jim Pingle over 1 year ago
Something else to consider, the package in FreeBSD gained two options between 22.05.x and 23.01:
CTR=off: Enable CTR cipher mode wrapper plugin KDF=on: Enable KDF (prf+) implementation plugin
Though neither of those seem particularly likely to cause a problem like this.
Might help to dig through the release notes for all of the intermediate versions to see if there is anything relevant. Current snaps have 5.9.10, 22.05 had 5.9.5.
- https://github.com/strongswan/strongswan/releases (5.9.10 back to 5.9.6)
Also be sure to compare the contents and structure of both /var/etc/ipsec/strongswan.conf
and /var/etc/ipsec/swanctl.conf
looking for differences, and look at the startup logs from both to see if there are any errors at startup or differences in loaded plugins for example.
Updated by Jim Pingle over 1 year ago
- Subject changed from IPSec phase2 rekey issues on specific PFS key groups to IPsec Phase 2 rekey failures with some PFS key groups
Updating subject for release notes.
Updated by Kristof Provost over 1 year ago
- Assignee set to Kristof Provost
I've started poking at this, and have been able to reproduce it. It still occurs on strongswan-5.9.10.
I've also determined that the first reported error is in select_and_install(), and is triggered by this->keymat->derive_child_keys() failing, so it's not so much an inability to install (the code doesn't even try to do that), but a failure to derive the child key.
The investigation continues.
Updated by Kristof Provost over 1 year ago
After a bit more digging I found that openssl_kdf.c's set_params() fails to set the salt with EVP_PKEY_CTX_add1_hkdf_info(). That turns out to be because we've wound up with a long shared secret (1088 bytes), which is more than OpenSSL is willing to handle in HKDF. The relevant RFC (5869) does not specify a maximum length for this, and upstream OpenSSL has bumped that limit to 2048 (https://github.com/openssl/openssl/commit/20c2876f24d0ccf9581ace08c7882d544d2588ea).
I'm making a build with that change, which I expect will fix this issue.
Updated by Kristof Provost over 1 year ago
- Status changed from Confirmed to Ready To Test
Confirmed fixed with that change.
It'll be in the next CE and plus snapshots.
Updated by Jim Pingle over 1 year ago
- Status changed from Ready To Test to Feedback
Updated by Steve Wheeler over 1 year ago
- Status changed from Feedback to Resolved
Confirmed this is rekeying as expected using PFS key group 18 in todays CE snnapshot: 2.7.0.a.20230418.0600
Apr 18 18:13:36 charon 86268 12[NET] <con3|1> received packet: from 172.21.16.180[500] to 172.21.16.22[500] (1248 bytes) Apr 18 18:13:36 charon 86268 12[ENC] <con3|1> parsed CREATE_CHILD_SA response 1934 [ N(ESP_TFC_PAD_N) SA No KE TSi TSr ] Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED notify Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting proposal: Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> proposal matches Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> received proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> configured proposals: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selected proposal: ESP:AES_CBC_128/HMAC_SHA2_256_128/MODP_8192/NO_EXT_SEQ Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting traffic selectors for us: Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> config: 192.168.22.0/24|/0, received: 192.168.22.0/24|/0 => match: 192.168.22.0/24|/0 Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> selecting traffic selectors for other: Apr 18 18:13:36 charon 86268 12[CFG] <con3|1> config: 192.168.180.0/24|/0, received: 192.168.180.0/24|/0 => match: 192.168.180.0/24|/0 Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{483} state change: CREATED => INSTALLING Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> using AES_CBC for encryption Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> using HMAC_SHA2_256_128 for integrity Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> adding inbound ESP SA Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc3cafb1a, src 172.21.16.180 dst 172.21.16.22 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> deleting SAD entry with SPI c3cafb1a Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> deleted SAD entry with SPI c3cafb1a Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> adding SAD entry with SPI c3cafb1a and reqid {1} Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256 Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> registering outbound ESP SA Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> policy 192.168.180.0/24|/0 === 192.168.22.0/24|/0 in already exists, increasing refcount Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> inbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0 Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{483} state change: INSTALLING => INSTALLED Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> adding outbound ESP SA Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> SPI 0xc1b8f722, src 172.21.16.22 dst 172.21.16.180 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> adding SAD entry with SPI c1b8f722 and reqid {1} Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using encryption algorithm AES_CBC with key size 128 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> using integrity algorithm HMAC_SHA2_256_128 with key size 256 Apr 18 18:13:36 charon 86268 12[KNL] <con3|1> policy 192.168.22.0/24|/0 === 192.168.180.0/24|/0 out already exists, increasing refcount Apr 18 18:13:36 charon 86268 12[IKE] <con3|1> outbound CHILD_SA con3{483} established with SPIs c3cafb1a_i c1b8f722_o and TS 192.168.22.0/24|/0 === 192.168.180.0/24|/0 Apr 18 18:13:36 charon 86268 12[CHD] <con3|1> CHILD_SA con3{482} state change: REKEYING => REKEYED
Updated by Jim Pingle over 1 year ago
- File before-rekey.png before-rekey.png added
- File after-rekey.png after-rekey.png added
Also working well here as well. Rekey happened on schedule and traffic continued to pass, where before it had failed: