Project

General

Profile

Actions

Bug #16836

open

IPsec daemon can crash if a peer initiates two rekeys for the same child SA

Added by David Hiebert 1 day ago. Updated about 14 hours ago.

Status:
Feedback
Priority:
Normal
Category:
IPsec
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
26.03.1
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

  1. Product / version
    - pfSense Plus 25.11.1-RELEASE
    - strongSwan version on 25.11.1: `strongswan-6.0.3` (confirmed via `pkg info strongswan`)
    - strongSwan version on 26.03: `strongswan-6.0.3_1` (confirmed by launching the Netgate pfSense Plus 26.03 AWS Marketplace AMI and querying `pkg info strongswan`)
    - The `_1` is a FreeBSD port revision bump; the CPE string still identifies the package as `strongswan:6.0.3`, `port_checkout_unclean: no`, and the upstream fix is not present.
  1. Summary
    Reproducible pattern of charon crashes on a pfSense Plus 25.11.1 IPsec concentrator. The crash signature matches upstream strongSwan issue strongswan/strongswan#2945 ("Crash caused if confused peer initiates two rekeyings for the same Child SA"), which was fixed in strongSwan 6.0.4 (released 2025-12-12). The crash has now been observed at least twice on the same host.

We have independently confirmed pfSense Plus 26.03 still bundles strongSwan 6.0.3 (port revision `_1`, no relevant patches). Request is that strongSwan >= 6.0.4 be shipped in a future pfSense Plus release or backported to the 25.11.x train.

  1. Evidence
  1. Kernel-level exit
    ```
    kernel: pid <pid> (charon), jid 0, uid 0: exited on signal 6 (core dumped)
    ```
    Signal 6 (SIGABRT) is charon's own abort() call from its internal signal handler after catching a critical signal (SIGBUS, signal 10 on FreeBSD).
  1. charon in-process stack (from ipsec.log immediately before abort)
    Fatal frame chain on the crashing worker thread:
    ```
    child_delete_create+0x31a
    <- task_manager_v2_create+0x2b22
    <- delete_child_sa_job_create_id+0x103
    <- processor_create
    <- thread_create
    ```

A coredump was preserved on the host but will not be shared (process memory of an IPsec daemon — contains session key material). A sanitized symbolic backtrace can be provided on request.

  1. Sequence at time of crash
    1. ~7 minutes before the crash: CHILD_SA on a site-to-site tunnel completed a rekey cycle cleanly (SPI A → SPI B; old SA transitioned REKEYED → DELETING → DELETED).
    2. A second rekey cycle on the same tunnel entered REKEYED → DELETED state.
    3. A CHILD_DELETE job was dispatched on the already-rekeyed CHILD_SA.
    4. Worker thread faulted inside `child_delete_create`.
    5. strongSwan's signal handler caught SIGBUS, logged "killing ourself, received critical signal", dumped the stack, and called abort().

Matches the mechanism described in strongswan/strongswan#2944: a peer driving two sequential rekeys on the same CHILD_SA, leaving the original SA destroyed while a delete job still references it.

  1. Upstream references
    - https://github.com/strongswan/strongswan/issues/2945 — fixed in 6.0.4 ("Prevent a crash if a confused peer rekeys a Child SA twice before sending a delete")
    - https://github.com/strongswan/strongswan/discussions/2944 — mechanism description
    - 6.0.5 adds a defensive follow-on fix: "Avoid an incorrect down event if deleting a rekeyed Child SA fails"
    - 6.0.6 (2026-04-22) includes several unrelated CVE fixes
  1. Requests

1. Ship strongSwan >= 6.0.4 in a future pfSense Plus release. 6.0.5 preferred for the follow-on fix; 6.0.6 adds CVE fixes worth having.
2. Backport consideration: a targeted backport of the 6.0.4 child-rekey fix to a 25.11.x package update would let deployments on the current train avoid a major version upgrade. Is this feasible?
3. Interim mitigation: are there `charon.strongswan.conf` tuning options (rekey margins, `delete_rekeyed` behavior, related options) that would reduce exposure while awaiting a fixed version?

  1. Impact
    Production IPsec concentrator serving site-to-site VPN tunnels. A charon crash drops all tunnels on the host until the daemon is restarted, causing service interruption for every tunnel on the concentrator.
  1. What can be provided on request
    - Sanitized backtrace (`thread apply all bt`, `info locals` on the failing frame) — can be shared via a non-public channel if needed
    - Timing of prior occurrence
    - Peer IKE implementation / vendor (we have identified the specific peer driving the double-rekey pattern)
Actions

Also available in: Atom PDF