Project

General

Profile

Actions

Bug #16824

open

dpinger gateway monitoring fails after IPsec VTI reload

Added by Chris Baker about 2 hours ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Gateway Monitoring
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.8.1
Affected Architecture:
All

Description

Environment

  • pfSense 2.8.1-RELEASE on FreeBSD 15.0
  • dpinger 3.3 (FreeBSD port, built 2025-05-22)
  • Multiple IPsec VTI tunnels with gateways monitored by dpinger
  • Standard pfSense gateway monitoring (no custom configuration)

Symptoms

Gateways monitored over IPsec VTI interfaces eventually stop reporting status.
The pfSense UI shows them as "Pending" or "Unknown" indefinitely. The only
remediation through normal channels is to restart dpinger
(Status > Services, or setup_gateways_monitor() via PHP shell).

The failure presents in three distinct modes:

  1. Process missing: dpinger process is gone entirely. No process, no PID
    file, no Unix status socket on disk. pfSense never respawns it.
  2. Process hung: dpinger process is running and the Unix status socket
    exists, but querying the socket times out (the usocket_thread is dead
    or blocked).
  3. Process zombie: dpinger process is running and the status socket
    responds, but it reports latency=0 stddev=0 loss=100. A manual ICMP
    ping with the exact same -S bind_addr monitor_addr parameters that
    dpinger uses succeeds with normal latency. The send/recv threads are
    orphaned -- they hold file descriptors that are no longer wired to a
    live interface.

The third mode is the most insidious because every external indicator
(process, socket, ICMP reachability) appears healthy.

Root cause

The dpinger daemon itself appears correct. Its send_thread handles
sendto errors by logging them and continuing -- it does not exit on
EHOSTUNREACH or any other transient error. The threads run in
while(1) loops and main() blocks on pthread_join of the last
thread created (the usocket_thread in pfSense's invocation).

The bug is in pfSense's gateway / IPsec interaction:

  1. An IPsec gateway briefly fails monitoring (real packet loss, alarm,
    or DPD event). dpinger fires /etc/rc.gateway_alarm.
  2. rc.gateway_alarm calls pfSctl -c "service reload ipsec ${GW}".
  3. That ultimately invokes ipsec_configure() in /etc/inc/ipsec.inc.
  4. ipsec_configure() calls ipsec_setup_gwifs(), which calls
    interface_ipsec_vti_configure() in /etc/inc/interfaces.inc.
  5. That function unconditionally destroys and recreates each VTI
    interface:
if (does_interface_exist($ipsecif)) {
    mwexec("/sbin/ifconfig " . escapeshellarg($ipsecif) . " destroy");
}
mwexec("/sbin/ifconfig " . escapeshellarg($ipsecif) . " create reqid ...");
  1. The dpinger process for that gateway is bound to an IP on the now-
    destroyed interface. Its raw ICMP socket is left in a broken state.
    When the new interface is created with the same IP, the old fd does
    not transparently re-bind. dpinger keeps logging
    sendto error: 65 (EHOSTUNREACH) and reports 100% loss forever.
  2. After ipsec_configure() completes, no call to
    setup_gateways_monitor() is made. The dpinger processes are
    never restarted, even though the interfaces they were monitoring
    were torn down and recreated underneath them.

The "process missing" and "process hung" modes appear to be downstream
consequences of the same destroy/recreate cycle (e.g. the process being
SIGTERMed during cleanup but the respawn step being skipped, or the
process exiting due to socket state dpinger does not handle).

Reproduction

Trigger packet loss on an IPsec VTI gateway sufficient to cause an
alarm. After the IPsec reload completes, observe that dpinger for that
gateway reports 100% loss permanently, while a manual ping using the
same -S bind_addr monitor_addr parameters succeeds.

Suggested fix (upstream)

Either of:

  • Add setup_gateways_monitor() to the end of ipsec_configure() in
    /etc/inc/ipsec.inc (or to /etc/rc.ipsec after the
    ipsec_configure() call).
  • Avoid the unconditional ifconfig destroy in
    interface_ipsec_vti_configure() when the interface configuration
    is unchanged.

The first is the smaller, lower-risk change.

Workaround

A shell script (see attached) run from cron once a minute detects all three failure
modes and restarts gateway monitoring:

  • For each gateway pfSense expects to monitor, check that a dpinger
    socket and process exist. If not, flag as missing.
  • For each socket that does exist, query it with a 5-second timeout.
    If the query fails or returns empty, flag as hung.
  • If a socket returns 100% loss, manually probe the same monitor_addr
    from the same bind_addr with ping -c 4. If pings succeed, flag as
    zombie. If pings also fail, the outage is real and dpinger is
    correct -- leave alone.

If anything is flagged, run setup_gateways_monitor() via PHP, which
cleanly stops and respawns all dpinger processes.

This has been running on pfSense 2.8.1 against an environment with
four IPsec VTI gateways and one DHCP WAN gateway. It correctly catches
the zombie case (most common in this environment), avoids restarting
during real outages, and runs to completion in well under a second
when nothing is wrong.

Related observations

  • The sendto error: 65 log spam is a useful early indicator but
    does not by itself cause the hang.
  • "exiting on signal 15" entries in syslog correlate with the
    process-missing case and confirm something external sends SIGTERM
    but no respawn follows.
  • The existing pfSense Service Watchdog package does not catch this
    because the dpinger service entry represents the collection of
    dpinger processes, not individual ones -- when even one is running
    the service is considered up.

Files

dpinger_watchdog.sh (4.53 KB) dpinger_watchdog.sh workaround cron script Chris Baker, 05/04/2026 08:31 PM

No data to display

Actions

Also available in: Atom PDF