Feature #12092
closedUtilize new ``pfctl`` abilities to kill states
100%
Description
In the latest pf changes present on 2.6.0, pfctl
now supports killing states by label. We are using this to kill schedule states, but we could also use it to kill states for specific rules. Caveat being the rules must have a unique label.
: pfctl -vvsr | grep -A2 Test123 @115(1624974401) pass in quick on vmx0 reply-to (vmx0 198.51.100.1) inet proto tcp from any to 198.51.100.6 port = http flags S/SA keep state label "USER_RULE: Test123" [ Evaluations: 73 Packets: 6 Bytes: 344 States: 1 ] [ Inserted: pid 64790 State Creations: 2 ] : pfctl -vvss | grep -A4 198.51.100.6:80 all tcp 198.51.100.6:80 <- 198.51.100.142:53432 ESTABLISHED:ESTABLISHED [2154389371 + 64256] wscale 9 [3322164514 + 65537] wscale 7 age 00:18:39, expires in 23:59:31, 2:1 pkts, 112:60 bytes, rule 115 id: 0b24db6000000001 creatorid: 69dbf34f gateway: 198.51.100.1 origif: vmx0 : pfctl -k label -k 'USER_RULE: Test123' killed 1 states : pfctl -vvss | grep -A4 198.51.100.6:80 :
The rule label must match exactly, it does not support partial, wildcard, or regex matching.
It may or may not be viable to have an icon on each rule row to do this since users may not realize that if they have a generic (or no) rule description that it will kill states for anything else with that same label. Such an icon could at least be hidden for rules with an empty label, but that may or may not be sufficient.
As an alternative tactic, pf now also supports multiple labels per rule, and we could add the rule ID as another label (e.g. label ruleid:<num>
), which would be more accurate than relying on the user-entered description. That also assumes the rule has an ID in its configuration, which we may need to check is always true.
The states display could have an input/button, perhaps in a collapsed advanced panel, to pick a label to kill from a drop-down list of all unique rule labels to avoid potential user input errors. That may not be viable since it wouldn't scale well, though. Systems with many rules may have problems rendering the box or finding/picking rules from the list.
Note that this does not solve problems like #1947 since this only affects states created by the rule matching the label, not traffic which would match the rule.
If it's not viable to add in the GUI then we should at least note it somewhere in the docs.
Related issues
Updated by Jim Pingle over 3 years ago
- Target version set to 2.6.0
- Plus Target Version set to 21.09
Updated by Jim Pingle over 3 years ago
Another random thought, it might be possible to leverage this to help with multi-wan (like #8555) since we could kill states for rule(s) using a gateway or group including a down gateway along with the ID of outbound rule(s) on the failed WAN (automatic and also floating rules). Worth investigating, but may not pan out.
Updated by → luckman212 over 3 years ago
@Jim yes that would be a godsend for multiwan if it works out. I always dreamed of being able to kill specific states that were tagged with a certain label (e.g. SIP connections) during failback events, but the best I was able to do was cobble together hacky shell scripts involving cron, pfctl, grep & awk... this would be so much nicer. I hope it's in the cards.
Updated by Jim Pingle over 3 years ago
→ luckman212 wrote:
@Jim yes that would be a godsend for multiwan if it works out. I always dreamed of being able to kill specific states that were tagged with a certain label (e.g. SIP connections) during failback events, but the best I was able to do was cobble together hacky shell scripts involving cron, pfctl, grep & awk... this would be so much nicer. I hope it's in the cards.
Even if this doesn't work out like I'm hoping you could script it easier with a rule label like "SIP connections" to match what you want and then pfctl -k label -k "USER_RULE: SIP connections"
to kill the connections matching that rule. Make sure to match in and out using the same label and it should catch them all.
Updated by Marcos M over 3 years ago
Note on "That also assumes the rule has an ID in its configuration, which we may need to check is always true."
This indeed should be taken into account. I've come across more than a handful of configurations were there existed rules without an ID, likely because the upgrade path for that was never hit.
Updated by Jim Pingle over 3 years ago
- Plus Target Version changed from 21.09 to 22.01
Moving ahead, still needs more thought/planning about how best to approach this
Updated by Jim Pingle about 3 years ago
- Target version changed from 2.6.0 to CE-Next
- Plus Target Version changed from 22.01 to 22.05
Updated by Jim Pingle over 2 years ago
- Assignee set to Jim Pingle
- Target version changed from CE-Next to 2.7.0
Updated by Jim Pingle over 2 years ago
- Status changed from New to In Progress
Adding basic functions here is pretty straightforward. It's easy enough to add a means to kill states created by a rule, though it's a little counterintuitive.
Killing by tracker ID will kill the states created by the rule with that ID, which is expected, but there will nearly always be another state as the connection exits the firewall and there isn't a way to associate that and kill it, too. But killing the one may be good enough for now.
Same story for killing states created by policy routing rules using a given gateway and group. We can find and kill the states created by the rules with the gateway/group set on them but not the egress states. This may be good enough, though, since we can kill the egress states without much trouble as is (e.g. kill states on WANX when WANX goes down, then kill any states created by GW_WANX).
I'll have some test code for this soon, at least for the manual state killing parts.
Updated by Jim Pingle over 2 years ago
The more I consider how this might work the less sure I am that the gateway part would be useful in a way most users would expect. Users would expect that it would kill any state using the gateway, even gateway groups, but it wouldn't be that precise. If a rule uses a gateway group it would have to kill any state using that group, not just states hitting that rule using a specific gateway inside the group. Granted that's still better than killing all states everywhere on gateway failure, but it may require extra clarification in the GUI and/or docs.
Updated by Jim Pingle over 2 years ago
Kristof let me know that we do also have pfctl -k gateway -k x.x.x.x
which would fill the missing pieces in here. It's not in the man page or command help so I missed that it was available again.
Updated by Jim Pingle over 2 years ago
- Subject changed from Utilize new ``pfctl`` ability to kill states by label to Utilize new ``pfctl`` abilities to kill states
Updating subject as this has evolved a bit to encompass both killing by label for rule IDs and killing by gateway.
Updated by Jim Pingle over 2 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
Applied in changeset c5d0d75dbdb11753fb95b3ffb933e546d49924ca.
Updated by Jim Pingle over 2 years ago
These changes will be available in snapshots soon. It grew a little bit since the initial description but it ended up better overall as there were problems with the original approach that are mostly solved by the different approach I ended up using.
- Added action on firewall rule list to kill states on an interface created by a specific rule (from firewall rule list)
- Added action on gateway status page to kill states created by policy routing rules using a specific gateway name (from gateway status page)
- Added action on gateway status page to kill states using the default gateway (0.0.0.0 or ::) -- these options match states from rules that DO NOT use policy routing or reply-to.
- Added action on gateway status and gateway group page to kill states by gateway IP address (catches route-to/policy routing and reply-to, both inbound and outbound)
- Added action on gateway group status page to kill states created by policy routing rules using a specific gateway group name (catching anything that hits rules without route-to)
- Change global state killing option to be granular (none, all down, flush all)
- Add per-gateway option to override global behavior (use default, do not kill, kill when down)
- Improve logic when determining which gateways are considered in state killing behavior.
- Log action when killing states
- Upgrade code to convert old setting to new format
I started a forum thread with additional information and for feedback: https://forum.netgate.com/topic/170690/new-state-killing-mechanisms-12092
Updated by Jim Pingle over 2 years ago
- Related to Feature #12931: Retain knowledge of previous dynamic gateway IP address when interface is down added
Updated by Jim Pingle over 2 years ago
- Related to Bug #8555: Selectively killing states on WAN failure added
Updated by Jim Pingle over 2 years ago
- Related to Feature #855: Ability to selectively kill states on gateway recovery added
Updated by Jim Pingle over 2 years ago
- Related to Bug #12942: Code to kill states for old gateway when reconnecting an interface is incorrect added
Updated by Jim Pingle over 2 years ago
- Status changed from Feedback to Closed
This has been working well for a while now. Any issues we hit from here can be addressed separately.
Updated by Marcos M almost 2 years ago
- Related to Bug #13934: Killing states by gateway can miss some IPv6 outbound states added