Project

General

Profile

Bug #10176

Multiple overlapping phase 2 SAs on VTI tunnels

Added by Brian Candler 3 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
IPsec
Target version:
Start date:
01/10/2020
Due date:
% Done:

0%

Estimated time:
Affected Version:
Affected Architecture:

Description

This might be a configuration error, but if so, I can't see it. The problem occurs with VTI tunnels between:

- "A end": a HA pair of XG-1537 (2.4.4p3)

and two different "B ends" which are single (non-HA) pfSense boxes:

- (B1) a Dell R220 running 2.4.4p3 (this is "con1000" from the point of view of the A end, and "con4000" from the B1 end)
- (B2) a SG-1000 running 2.4.4p3 (this is "con2000" from the point of view of the A end, and also "con2000" from the B2 end)

What I see is that many overlapping phase2 connections are created. This doesn't actually stop the tunnels from working, but obviously something is wrong somewhere.

# on the A end
/usr/local/sbin/swanctl --list-sas | grep con1000 | wc -l
      12
/usr/local/sbin/swanctl --list-sas | grep con2000 | wc -l
      76

# on the B1 end
/usr/local/sbin/swanctl --list-sas | grep con4000 | wc -l
      12

# on the B2 end
/usr/local/sbin/swanctl --list-sas | grep con2000 | wc -l
      76

Actually the B1 and B2 ends also have a direct tunnel between them, and appear to have the same issue, so I don't think it's anything to do with the HA configuration.

# B1
/usr/local/sbin/swanctl --list-sas | grep con6000 | wc -l
       8

# B2
/usr/local/sbin/swanctl --list-sas | grep con5000 | wc -l
       8

Fuller --list-sas output from the A end, showing only the con1000 SAs to the B1 end:

con1000: #4360, ESTABLISHED, IKEv2, XXXXXXXX_i* XXXXXXXX_r
  local  'X.X.X.X' @ X.X.X.X[500]
  remote 'Y.Y.Y.Y' @ Y.Y.Y.Y[500]
  AES_CBC-128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
  established 24492s ago, reauth in 2826s
  con1000: #306966, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 1187s ago, rekeying in 1530s, expires in 2413s
    in  c0d37246, 2897353 bytes, 24494 packets
    out c6da3dcd, 112669792 bytes, 87588 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306967, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 1146s ago, rekeying in 1408s, expires in 2454s
    in  c8c07783, 6915344 bytes, 69796 packets
    out c826b292, 310892936 bytes, 235526 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306969, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 1044s ago, rekeying in 1673s, expires in 2556s
    in  cdea40bc, 3814242 bytes, 41079 packets
    out c8c8d986, 181956192 bytes, 137160 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306970, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 993s ago, rekeying in 1949s, expires in 2607s
    in  ca2a3c90, 2457237 bytes, 32152 packets
    out ce583d9b, 140862888 bytes, 103906 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306971, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 963s ago, rekeying in 1684s, expires in 2637s
    in  cee1d7dc, 454128 bytes,  3866 packets
    out c7e834b9, 15782828 bytes, 12245 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306972, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 957s ago, rekeying in 1791s, expires in 2643s
    in  c312d39f, 2736480 bytes, 24750 packets
    out c25c5023, 110195864 bytes, 85358 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306973, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 912s ago, rekeying in 1620s, expires in 2688s
    in  c2265d92, 12641898 bytes, 119647 packets
    out c1e0e608, 518579896 bytes, 396490 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306974, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 732s ago, rekeying in 1812s, expires in 2868s
    in  c479aead, 5165104 bytes, 49388 packets
    out c12da9a9, 210883956 bytes, 161739 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306975, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 659s ago, rekeying in 1877s, expires in 2941s
    in  c634de90, 19159395 bytes, 249823 packets
    out cec9794d, 1069186884 bytes, 772087 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306976, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 465s ago, rekeying in 2231s, expires in 3135s
    in  c5bac3fc, 490906864 bytes, 441747 packets
    out cff7482e, 457592048 bytes, 402168 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0
  con1000: #306979, reqid 1000, INSTALLED, TUNNEL, ESP:AES_CBC-128/HMAC_SHA2_256_128/MODP_2048
    installed 295s ago, rekeying in 2506s, expires in 3305s
    in  c0c1d451, 20607802 bytes, 195867 packets
    out cb1cded7, 847400800 bytes, 647600 packets
    local  0.0.0.0/0|/0
    remote 0.0.0.0/0|/0

Here is the con1000 tunnel configuration at the A end: (note I had to change the "pre-shared-key" XML tag to stop redmine mangling it)

        <phase1>
            <ikeid>1</ikeid>
            <iketype>ikev2</iketype>
            <interface>_vip5ce58f3a60ba7</interface>
            <remote-gateway>Y.Y.Y.Y</remote-gateway>
            <protocol>inet</protocol>
            <myid_type>myaddress</myid_type>
            <myid_data></myid_data>
            <peerid_type>peeraddress</peerid_type>
            <peerid_data></peerid_data>
            <encryption>
                <item>
                    <encryption-algorithm>
                        <name>aes</name>
                        <keylen>128</keylen>
                    </encryption-algorithm>
                    <hash-algorithm>sha256</hash-algorithm>
                    <dhgroup>14</dhgroup>
                </item>
            </encryption>
            <lifetime>28800</lifetime>
            <Xre-shared-key>XXXXXXXX</Xre-shared-key>
            <private-key></private-key>
            <certref></certref>
            <caref></caref>
            <authentication_method>pre_shared_key</authentication_method>
            <descr><![CDATA[lch-fw]]></descr>
            <nat_traversal>on</nat_traversal>
            <mobike>off</mobike>
            <margintime></margintime>
            <dpd_delay>10</dpd_delay>
            <dpd_maxfail>5</dpd_maxfail>
        </phase1>
...
        <phase2>
            <ikeid>1</ikeid>
            <uniqid>5ce644f67e37d</uniqid>
            <mode>vti</mode>
            <reqid>1</reqid>
            <localid>
                <type>network</type>
                <address>10.9.1.17</address>
                <netbits>29</netbits>
            </localid>
            <remoteid>
                <type>address</type>
                <address>10.9.1.18</address>
            </remoteid>
            <protocol>esp</protocol>
            <encryption-algorithm-option>
                <name>aes</name>
                <keylen>128</keylen>
            </encryption-algorithm-option>
            <encryption-algorithm-option>
                <name>aes128gcm</name>
                <keylen>128</keylen>
            </encryption-algorithm-option>
            <hash-algorithm-option>hmac_sha256</hash-algorithm-option>
            <pfsgroup>14</pfsgroup>
            <lifetime>3600</lifetime>
            <pinghost></pinghost>
            <descr></descr>
        </phase2>

And the corresponding con4000 tunnel configuration at the B1 end:

        <phase1>
            <ikeid>4</ikeid>
            <iketype>ikev2</iketype>
            <interface>wan</interface>
            <remote-gateway>X.X.X.X</remote-gateway>
            <protocol>inet</protocol>
            <myid_type>myaddress</myid_type>
            <myid_data></myid_data>
            <peerid_type>peeraddress</peerid_type>
            <peerid_data></peerid_data>
            <encryption>
                <item>
                    <encryption-algorithm>
                        <name>aes</name>
                        <keylen>128</keylen>
                    </encryption-algorithm>
                    <hash-algorithm>sha256</hash-algorithm>
                    <dhgroup>14</dhgroup>
                </item>
            </encryption>
            <lifetime>28800</lifetime>
            <Xre-shared-key>XXXXXXXX</Xre-shared-key>
            <private-key></private-key>
            <certref></certref>
            <caref></caref>
            <authentication_method>pre_shared_key</authentication_method>
            <descr><![CDATA[ldex-fw]]></descr>
            <nat_traversal>on</nat_traversal>
            <mobike>off</mobike>
            <margintime></margintime>
            <dpd_delay>10</dpd_delay>
            <dpd_maxfail>5</dpd_maxfail>
        </phase1>
...
        <phase2>
            <ikeid>4</ikeid>
            <uniqid>5ce644a266cf6</uniqid>
            <mode>vti</mode>
            <reqid>4</reqid>
            <localid>
                <type>network</type>
                <address>10.9.1.18</address>
                <netbits>29</netbits>
            </localid>
            <remoteid>
                <type>address</type>
                <address>10.9.1.17</address>
            </remoteid>
            <protocol>esp</protocol>
            <encryption-algorithm-option>
                <name>aes</name>
                <keylen>128</keylen>
            </encryption-algorithm-option>
            <encryption-algorithm-option>
                <name>aes128gcm</name>
                <keylen>128</keylen>
            </encryption-algorithm-option>
            <hash-algorithm-option>hmac_sha256</hash-algorithm-option>
            <pfsgroup>14</pfsgroup>
            <lifetime>3600</lifetime>
            <pinghost></pinghost>
            <descr></descr>
        </phase2>

Side note: there is OpenBGP routing on top of this, and there is some relaying of traffic via VTI interfaces. Specifically: A also has tunnels to AWS, and there is traffic which flows B1 -> A -> AWS, and B2 -> A -> AWS (i.e. in one VTI interface and out another VTI interface). I can't see how this has any relevance, given that VTI SAs match 0.0.0.0/0 and therefore should allow all traffic, but I thought it was worth mentioning.

History

#1 Updated by Brian Candler 3 months ago

I should add: these overlapping SAs don't occur for VTI tunnels to AWS. I consistently get only a single phase2 SA for each AWS tunnel:

# At "A end", which also has 4 VTI tunnels to AWS
/usr/local/sbin/swanctl --list-sas | grep 'con[4567]000'
con7000: #4364, ESTABLISHED, IKEv1, XXXX_i* XXXX_r
  con7000: #307070, reqid 7000, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_2048
con5000: #4362, ESTABLISHED, IKEv1, XXXX_i* XXXX_r
  con5000: #307072, reqid 5000, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_1024
con4000: #4368, ESTABLISHED, IKEv1, XXXX_i* XXXX_r
  con4000: #307068, reqid 4000, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_1024
con6000: #4367, ESTABLISHED, IKEv1, XXXX_i* XXXX_r
  con6000: #307059, reqid 6000, INSTALLED, TUNNEL-in-UDP, ESP:AES_CBC-128/HMAC_SHA1_96/MODP_2048

I do have a different problem with AWS tunnels (#10175) - it's the same HA pair for both these tickets.

#2 Updated by Jim Pingle 3 months ago

  • Category set to IPsec
  • Status changed from New to Duplicate
  • Affected Version deleted (2.4.4-p3)

If there is anything actionable here it's almost certainly solved by #9603 and needs tested on 2.5.0 snapshots.

If it's not, then it's in strongSwan and not something we can control.

#3 Updated by Izaac Falken about 1 month ago

I just watched this happen in 2.5.0-DEVELOPMENT (amd64) with a configuration straight out of:
https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/ipsec-routed.html

So, no, #9603 does not solve it.

What now? A shrug "it's StrongSwan" is not an acceptable answer.

#4 Updated by Jim Pingle about 1 month ago

Was it 2.5.0 on both ends? If either end is 2.4.x, it still could be that side triggering the problem.

#5 Updated by Jim Pingle about 1 month ago

  • Status changed from Duplicate to Feedback
  • Assignee set to Jim Pingle
  • Target version set to 2.5.0

I don't yet see a reason why it happened, but I caught one tunnel in my lab doing this, 2.5.0 to 2.5.0. An identical tunnel on another pair of lab boxes didn't do it.

I was able to stop both sides, restart, and a new copy appeared around when the tunnel rekeyed, but it isn't consistent.

I set "Child SA Close Action" on one side to "Restart/Reconnect" and I set the other side to "Close connection and clear SA", and so far it has not come back. That setting is available on 2.4.5 and 2.5.0. Leaving this on Feedback for a bit to see if it comes back or if I can get a better lead on what is happening otherwise.

#6 Updated by Jim Pingle about 1 month ago

  • Status changed from Feedback to In Progress

It took it longer to happen but it still happened when set that way. Still investigating.

#7 Updated by Izaac Falken about 1 month ago

Jim Pingle wrote:

Was it 2.5.0 on both ends? If either end is 2.4.x, it still could be that side triggering the problem.

Yes. (You've reproduced it yourself, so this comment is moot. But I didn't want to just leave the question hanging.)

Also available in: Atom PDF