Project

General

Profile

Actions

Bug #15108

closed

``pfctl`` is unable to retrieve state creator list in certain circumstances

Added by Jim Pingle 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
Operating System
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

In certain cases pfctl -sc is unable to obtain the list of state creators, and instead results in an error message such as:

System Log / Kernel Message Buffer:

Dec 4 18:49:08 kernel Warning: too many creators!

CLI:

: pfctl -sc
pfctl: Failed to retrieve creators

The output of truss -o creatordebug.txt -f pfctl -sc from an affected system is attached.

The specific circumstances under which this fails have not been determined yet and thus far we have been unable to trigger the error in lab conditions.

See also: https://forum.netgate.com/topic/184561/no-state-creator-host-ids-visible


Files

creatordebug.txt (7.23 KB) creatordebug.txt Jim Pingle, 12/20/2023 09:39 PM
Actions #2

Updated by Kristof Provost 4 months ago

I think I see how the 'No space left on device' error can happen if we have many creator ids.
It's already fixed, because current versions of that code use the netlink to communicate, so that specific bug is already gone.

That's not the root cause of this issue though, because we ought to have 1 or 2 (distinct) creator ids and no more, and the 'Warning: too many creators!' kernel message means we have 16.

A full state table output (i.e. pfctl -ss -vvv) might give us clues about why there are so many different creator ids.

Actions #3

Updated by Kristof Provost 3 months ago

  • Status changed from New to Feedback

Quick summary from the forum discussion: the reporter has upgraded both (pfsync) hosts to the same version, and the problem appears to have gone away.

Previously they ran different versions, with different pfsync protocol versions. That ought to work, and I failed to reproduce this behaviour in such a setup, but as it did go away for the reporter that's the most probable cause.

Given that the actionable issue (the 'no space left on device' error) is already fixed I think we're done here.

Actions #4

Updated by Jim Pingle 3 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

Given that we can't reproduce it there isn't a good way to verify the fix, so we can close this out for now. If we get any additional reports we can update/reopen it as needed.

Actions

Also available in: Atom PDF