Bug #16540
openReserved dummynet pipes for Captive Portal can overlap
100%
Description
Periodically, and outside of work hours (don't know if that's relevant as it may be luck), the allowed hostnames (accessible via services_captiveportal_hostname.php) in our Captive Portal fail to be reachable. Investigation of the issue on the box in question has led to the discovery that while the pf ether pass rules remain up throughout and are being matched and relevant counters are incremented, the associated dnpipe with the rule is missing and is the reason for the outages.
Here is an example rule:
ether pass in quick l3 from any to <cpzoneid_2_hostname_26> tag cpzoneid_2_auth dnpipe 2186
Changing the dnpipe number to one used by another working rule reliably makes traffic flow again
# echo 'ether pass in quick l3 from any to <cpzoneid_2_hostname_26> tag cpzoneid_2_auth dnpipe 2282' | pfctl -a 'cpzoneid_2_allowedhosts/hostname_26' -f -
Whenever the box is dropping traffic, there is no pipe:
# dnctl pipe show 2186 [NOTHING]
And when traffic does start flowing again (through no intervention on our part) there is a pipe:
# dnctl pipe show 2186 02186: unlimited 0 ms burst 0 q133258 100 sl. 0 flows (1 buckets) sched 67722 weight 0 lmax 0 pri 0 droptail sched 67722 type FIFO flags 0x0 16 buckets 0 active
The dnpipe numbers in the pf ether rules remain constant throughout: it is just the absence and presence of the dnpipe which is causing our issue. Hopefully this is enough information for you, but this situation is 100% repeatable: the hosts become unreachable for 10-12 hours each night, in a staggered fashion, presumably as the dnpipes are reaped and replaced at different times.
We do not use any traffic shaping
Details that may be relevant:
Netgate pfSense Plus
25.07.1-RELEASE (amd64)
built on Fri Oct 24 15:27:00 BST 2025
FreeBSD 15.0-CURRENT
Thank you for your time.
Files
Updated by Marcos M 3 days ago
- Priority changed from High to Normal
Is this an HA setup? Do you have "Preserve users database" checked in any of the Captive Portal zone configs? When the issue occurs, share the output of the following - run it at Diagnostics > Command Prompt > Execute PHP Commands:
var_dump(array_filter(unserialize_data(file_get_contents('/var/db/captiveportaldn.rules'))));
Updated by Christopher Causer 3 days ago
Marcos M wrote in #note-1:
Is this an HA setup? Do you have "Preserve users database" checked in any of the Captive Portal zone configs? When the issue occurs, share the output of the following - run it at Diagnostics > Command Prompt > Execute PHP Commands:
Preserve users database is enabled and cannot be turned off because High Availability is in use on the box.
It is working at the moment, but just in case it's relevant, here is the output to contrast with the output I supply later today
Updated by Christopher Causer 3 days ago
At the risk of jumping the gun here, I've taken a look at this output.
Pipe numbers associated with "Allowed IP Addresses" have value "tawny_owl_allowed". These are fine and are working correctly.
Pipes associated with authenticated captive portal clients are valued "tawny_owl_auth". These are also fine and are working correctly.
Pipes associated with "Allowed Hostnames" also have value "tawny_owl_auth", and these are the ones that are going down daily.
Updated by Christopher Causer 3 days ago
Here is the list during the outage.
Just as an example, here is a rule taken from /var/etc/filterdns-tawny_owl-captiveportal.conf
pf [HOST] cpzoneid_2_hostname_26 cpzoneid_2_allowedhosts/hostname_26
And here is the rule:
# pfctl -a cpzoneid_2_allowedhosts/hostname_26 -s ether -vvv @0 ether pass in quick l3 from any to <cpzoneid_2_hostname_26> tag cpzoneid_2_auth dnpipe 2186 [ Evaluations: 271533424 Packets: 88416 Bytes: 8716588 ] [ Last Active Time: Tue Nov 18 20:36:05 2025 ]
And here is the complete output as you requested. dnpipe 2186 is missing.
Updated by Christopher Causer 3 days ago
Marcos M wrote in #note-5:
When the issue happened, was there a CARP event or any configuration change to Captive Portal?
Hard to pinpoint because the hosts don't go down simultaneously, more in a staggered fashion (and they come back in a similar fashion). Hosts can go down hours apart. The host we're talking about above went down at 16:43 Europe/London this afternoon (the outages seem to be creeping into work hours). I can see no corresponding CARP events. We get emailed on CARP failover.
No config changes made by either myself or my colleagues.
Updated by Marcos M 3 days ago
- File cppatch.txt added
There's potential for this kind of issue to occur in the mentioned cases. It's unclear how else it can happen but we can try the attached patch. Use the System Patches package and copy/paste the contents. Apply the patch to both the primary and secondary node, then bounce the CARP status by entering and leaving maintenance mode (wait a moment between enter/leave) under Status > CARP.
When testing, keep in mind:- Changes to allowed MAC, IP, or hostname entries should be followed up with re-saving the "Configuration" tab for the zone. The pipe reservations may otherwise become unsynchronized.
- Make sure that only a single CARP VIP is configured per interface used in Captive Portal. Multiple CARP VIPs for an interface used in Captive Portal can result in duplicate pipe reservations.
Updated by Marcos M 2 days ago
- File cppatch_25.07.1.txt cppatch_25.07.1.txt added
Updated by Christopher Causer 1 day ago
I think I've found the issue. The dnpipe numbers are being duplicated between the auth'd clients and the allowed hosts. This is why it works during the working day: people have authenticated and it just so happens that a client is assigned the same dnpipe number as one used by the allowed hostnames.
Here is 2186 this morning, right after it started working:
pfctl -a '*' -s ether | grep 2186
ether pass in quick proto 0x0800 l3 from 10.36.22.206 to any tag cpzoneid_2_auth dnpipe 2186
ether pass in quick l3 from any to <cpzoneid_2_hostname_26> tag cpzoneid_2_auth dnpipe 2186
It also fits as to why the hosts go down at different times: it's when a particular client is dropped at the end of the day, and its anchors and pipes are cleaned up.
We haven't applied the patch, but I don't think it would fix the issue we are experiencing based on this.
Updated by Marcos M about 22 hours ago
- Status changed from New to Feedback
- % Done changed from 0 to 100
Applied in changeset c42eba1d78cc0b97dcb5abc604c9ab7e6e50d8a9.
Updated by Marcos M about 22 hours ago
- Subject changed from dn pipe erroneously reaped for captive portal allowed hostame exception: leading to outage to Reserved dummynet pipes for Captive Portal can overlap
- Assignee set to Marcos M
- Target version set to 2.9.0
- Plus Target Version set to 25.11
- Affected Architecture All added
- Affected Architecture deleted (
amd64)
Updated by Marcos M about 22 hours ago
The dummynet pipes are created and removed based on the pipe reservation. Inaccurate pipe reservation data can result in the wrong dummynet pipe being removed. The patch fixes various ways that result in inaccurate reservations. It's possible there are additional cases that need to be fixed; the patch (along with the mentioned notes) will help narrow it down if so.
I'm including the current fixes in 25.11 to address the known cases.