Bug #16824: dpinger gateway monitoring fails after IPsec VTI reload - pfSense - pfSense bugtracker

Actions

Copy link

Bug #16824

closed

dpinger gateway monitoring fails after IPsec VTI reload

Added by Chris Baker 2 months ago. Updated 2 months ago.

Status:

Not a Bug

Priority:

Normal

Assignee:

Category:

Gateway Monitoring

Target version:

Start date:

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Default

Affected Version:

2.8.1

Affected Architecture:

All

Description

Environment¶

pfSense 2.8.1-RELEASE on FreeBSD 15.0
dpinger 3.3 (FreeBSD port, built 2025-05-22)
Multiple IPsec VTI tunnels with gateways monitored by dpinger
Standard pfSense gateway monitoring (no custom configuration)

Symptoms¶

Gateways monitored over IPsec VTI interfaces eventually stop reporting status.
The pfSense UI shows them as "Pending" or "Unknown" indefinitely. The only
remediation through normal channels is to restart dpinger
(Status > Services, or setup_gateways_monitor() via PHP shell).

The failure presents in three distinct modes:

Process missing: dpinger process is gone entirely. No process, no PID
file, no Unix status socket on disk. pfSense never respawns it.
Process hung: dpinger process is running and the Unix status socket
exists, but querying the socket times out (the usocket_thread is dead
or blocked).
Process zombie: dpinger process is running and the status socket
responds, but it reports latency=0 stddev=0 loss=100. A manual ICMP
ping with the exact same -S bind_addr monitor_addr parameters that
dpinger uses succeeds with normal latency. The send/recv threads are
orphaned -- they hold file descriptors that are no longer wired to a
live interface.

The third mode is the most insidious because every external indicator
(process, socket, ICMP reachability) appears healthy.

Root cause¶

The dpinger daemon itself appears correct. Its send_thread handles
sendto errors by logging them and continuing -- it does not exit on
EHOSTUNREACH or any other transient error. The threads run in
while(1) loops and main() blocks on pthread_join of the last
thread created (the usocket_thread in pfSense's invocation).

The bug is in pfSense's gateway / IPsec interaction:

An IPsec gateway briefly fails monitoring (real packet loss, alarm,
or DPD event). dpinger fires /etc/rc.gateway_alarm.
rc.gateway_alarm calls pfSctl -c "service reload ipsec ${GW}".
That ultimately invokes ipsec_configure() in /etc/inc/ipsec.inc.
ipsec_configure() calls ipsec_setup_gwifs(), which calls
interface_ipsec_vti_configure() in /etc/inc/interfaces.inc.
That function unconditionally destroys and recreates each VTI
interface:

if (does_interface_exist($ipsecif)) {
    mwexec("/sbin/ifconfig " . escapeshellarg($ipsecif) . " destroy");
}
mwexec("/sbin/ifconfig " . escapeshellarg($ipsecif) . " create reqid ...");

The dpinger process for that gateway is bound to an IP on the now-
destroyed interface. Its raw ICMP socket is left in a broken state.
When the new interface is created with the same IP, the old fd does
not transparently re-bind. dpinger keeps logging
sendto error: 65 (EHOSTUNREACH) and reports 100% loss forever.
After ipsec_configure() completes, no call to
setup_gateways_monitor() is made. The dpinger processes are
never restarted, even though the interfaces they were monitoring
were torn down and recreated underneath them.

The "process missing" and "process hung" modes appear to be downstream
consequences of the same destroy/recreate cycle (e.g. the process being
SIGTERMed during cleanup but the respawn step being skipped, or the
process exiting due to socket state dpinger does not handle).

Reproduction¶

Trigger packet loss on an IPsec VTI gateway sufficient to cause an
alarm. After the IPsec reload completes, observe that dpinger for that
gateway reports 100% loss permanently, while a manual ping using the
same -S bind_addr monitor_addr parameters succeeds.

Suggested fix (upstream)¶

Either of:

Add setup_gateways_monitor() to the end of ipsec_configure() in
/etc/inc/ipsec.inc (or to /etc/rc.ipsec after the
ipsec_configure() call).
Avoid the unconditional ifconfig destroy in
interface_ipsec_vti_configure() when the interface configuration
is unchanged.

The first is the smaller, lower-risk change.

Workaround¶

A shell script (see attached) run from cron once a minute detects all three failure
modes and restarts gateway monitoring:

For each gateway pfSense expects to monitor, check that a dpinger
socket and process exist. If not, flag as missing.
For each socket that does exist, query it with a 5-second timeout.
If the query fails or returns empty, flag as hung.
If a socket returns 100% loss, manually probe the same monitor_addr
from the same bind_addr with ping -c 4. If pings succeed, flag as
zombie. If pings also fail, the outage is real and dpinger is
correct -- leave alone.

If anything is flagged, run setup_gateways_monitor() via PHP, which
cleanly stops and respawns all dpinger processes.

This has been running on pfSense 2.8.1 against an environment with
four IPsec VTI gateways and one DHCP WAN gateway. It correctly catches
the zombie case (most common in this environment), avoids restarting
during real outages, and runs to completion in well under a second
when nothing is wrong.

Related observations¶

The sendto error: 65 log spam is a useful early indicator but
does not by itself cause the hang.
"exiting on signal 15" entries in syslog correlate with the
process-missing case and confirm something external sends SIGTERM
but no respawn follows.
The existing pfSense Service Watchdog package does not catch this
because the dpinger service entry represents the collection of
dpinger processes, not individual ones -- when even one is running
the service is considered up.

Files

dpinger_watchdog.sh (4.53 KB) dpinger_watchdog.sh

workaround cron script

Chris Baker, 05/04/2026 08:31 PM

Actions

Copy link

Updated by Marcos M 2 months ago

Status changed from New to Not a Bug

This works as expected in tests with 26.03-RELEASE. Even with the VTI being recreated the existing dpinger socket was still valid and monitoring continued after the tunnel re-initiated.

Actions

Copy link

Updated by Chris Baker 2 months ago

Marcos M wrote in #note-1:

This works as expected in tests with 26.03-RELEASE. Even with the VTI being recreated the existing dpinger socket was still valid and monitoring continued after the tunnel re-initiated.

Thanks for testing this. Before the ticket is closed I'd like to lay out a couple of points — the technical findings here are pretty firm, but I don't want to overreach on the version-comparison side.

Version difference¶

The ticket was filed against CE 2.8.1 (FreeBSD 15.0-CURRENT, released 2025-09-04). 26.03-RELEASE is pfSense Plus on FreeBSD 16.0-CURRENT, released 2026-04-01. I don't know how aligned the relevant IPsec/gateway code is between the two branches, so I can't say with certainty whether a clean result on 26.03 should carry over to 2.8.1. Could you confirm whether the 26.03 result was expected to apply to 2.8.1, and whether "Fixed" (with target version) might be a more accurate disposition than "Not a Bug" if the two branches differ here? I'd just like to avoid the CE branch being closed off from a potential backport.

The "dpinger socket was still valid" observation¶

I want to flag this carefully, because I think the original report may not have made the symptom clear enough. dpinger has two sockets that behave very differently:

The Unix-domain status socket (/var/run/dpinger_*.sock) served by usocket_thread. Lives on local disk, not affected by the VTI being destroyed.
The raw ICMP send/recv sockets, bound via -B to an IP on the VTI interface. These are what actually probe the gateway.

The dominant failure mode on 2.8.1 here is that the Unix status socket keeps responding with fresh-looking data, returning latency=0 stddev=0 loss=100, while a manual ping -c 4 -S bind_addr monitor_addr from the same firewall succeeds with normal latency. The status socket being valid does not contradict the bug — the bug is precisely that the status socket remains valid and continues reporting while the underlying ICMP path is dead. Running setup_gateways_monitor() immediately restores monitoring.

There are also two less common modes: process gone entirely (exiting on signal 15 in syslog around an IPsec reload, no respawn), and Unix socket itself becoming unresponsive. All three have been observed on this firewall.

Reproducibility on 2.8.1¶

A watchdog has been running here that, when dpinger reports 100% loss, probes ping -c 4 -S bind_addr monitor_addr and only flags the gateway if those pings succeed (i.e. dpinger is wrong). It has triggered repeatedly on CE 2.8.1, each trigger correlating with sendto error: 65 bursts in syslog around IPsec reload events. Happy to attach watchdog logs and a syslog excerpt.

Would you be willing to reopen pending another look on 2.8.1, or let me know what specifically to test there to either confirm or rule this out?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Bug #16824

dpinger gateway monitoring fails after IPsec VTI reload

Environment¶

Symptoms¶

Root cause¶

Reproduction¶

Suggested fix (upstream)¶

Workaround¶

Related observations¶

Updated by Marcos M 2 months ago

Updated by Chris Baker 2 months ago

Version difference¶

The "dpinger socket was still valid" observation¶

Reproducibility on 2.8.1¶