Project

General

Profile

Actions

Feature #12092

closed

Utilize new ``pfctl`` abilities to kill states

Added by Jim Pingle over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Very Low
Assignee:
Category:
Rules / NAT
Target version:
Start date:
06/29/2021
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
22.05
Release Notes:
Default

Description

In the latest pf changes present on 2.6.0, pfctl now supports killing states by label. We are using this to kill schedule states, but we could also use it to kill states for specific rules. Caveat being the rules must have a unique label.

: pfctl -vvsr | grep -A2 Test123
@115(1624974401) pass in quick on vmx0 reply-to (vmx0 198.51.100.1) inet proto tcp from any to 198.51.100.6 port = http flags S/SA keep state label "USER_RULE: Test123" 
  [ Evaluations: 73        Packets: 6         Bytes: 344         States: 1     ]
  [ Inserted: pid 64790 State Creations: 2     ]
: pfctl -vvss | grep -A4 198.51.100.6:80
all tcp 198.51.100.6:80 <- 198.51.100.142:53432       ESTABLISHED:ESTABLISHED
   [2154389371 + 64256] wscale 9  [3322164514 + 65537] wscale 7
   age 00:18:39, expires in 23:59:31, 2:1 pkts, 112:60 bytes, rule 115
   id: 0b24db6000000001 creatorid: 69dbf34f gateway: 198.51.100.1
   origif: vmx0
: pfctl -k label -k 'USER_RULE: Test123'
killed 1 states
: pfctl -vvss | grep -A4 198.51.100.6:80
:

The rule label must match exactly, it does not support partial, wildcard, or regex matching.

It may or may not be viable to have an icon on each rule row to do this since users may not realize that if they have a generic (or no) rule description that it will kill states for anything else with that same label. Such an icon could at least be hidden for rules with an empty label, but that may or may not be sufficient.

As an alternative tactic, pf now also supports multiple labels per rule, and we could add the rule ID as another label (e.g. label ruleid:<num>), which would be more accurate than relying on the user-entered description. That also assumes the rule has an ID in its configuration, which we may need to check is always true.

The states display could have an input/button, perhaps in a collapsed advanced panel, to pick a label to kill from a drop-down list of all unique rule labels to avoid potential user input errors. That may not be viable since it wouldn't scale well, though. Systems with many rules may have problems rendering the box or finding/picking rules from the list.

Note that this does not solve problems like #1947 since this only affects states created by the rule matching the label, not traffic which would match the rule.

If it's not viable to add in the GUI then we should at least note it somewhere in the docs.


Related issues

Related to Feature #12931: Retain knowledge of previous dynamic gateway IP address when interface is downResolvedJim Pingle

Actions
Related to Bug #8555: Selectively killing states on WAN failureDuplicate06/06/2018

Actions
Related to Feature #855: Ability to selectively kill states on gateway recoveryResolvedMarcos M08/27/2010

Actions
Related to Bug #12942: Code to kill states for old gateway when reconnecting an interface is incorrectResolvedMarcos M

Actions
Related to Bug #13934: Killing states by gateway can miss some IPv6 outbound statesClosedMarcos M

Actions
Actions #1

Updated by Jim Pingle over 3 years ago

  • Target version set to 2.6.0
  • Plus Target Version set to 21.09
Actions #2

Updated by Jim Pingle over 3 years ago

Another random thought, it might be possible to leverage this to help with multi-wan (like #8555) since we could kill states for rule(s) using a gateway or group including a down gateway along with the ID of outbound rule(s) on the failed WAN (automatic and also floating rules). Worth investigating, but may not pan out.

Actions #3

Updated by → luckman212 over 3 years ago

@Jim yes that would be a godsend for multiwan if it works out. I always dreamed of being able to kill specific states that were tagged with a certain label (e.g. SIP connections) during failback events, but the best I was able to do was cobble together hacky shell scripts involving cron, pfctl, grep & awk... this would be so much nicer. I hope it's in the cards.

Actions #4

Updated by Jim Pingle over 3 years ago

→ luckman212 wrote:

@Jim yes that would be a godsend for multiwan if it works out. I always dreamed of being able to kill specific states that were tagged with a certain label (e.g. SIP connections) during failback events, but the best I was able to do was cobble together hacky shell scripts involving cron, pfctl, grep & awk... this would be so much nicer. I hope it's in the cards.

Even if this doesn't work out like I'm hoping you could script it easier with a rule label like "SIP connections" to match what you want and then pfctl -k label -k "USER_RULE: SIP connections" to kill the connections matching that rule. Make sure to match in and out using the same label and it should catch them all.

Actions #5

Updated by Marcos M over 3 years ago

Note on "That also assumes the rule has an ID in its configuration, which we may need to check is always true."

This indeed should be taken into account. I've come across more than a handful of configurations were there existed rules without an ID, likely because the upgrade path for that was never hit.

Actions #6

Updated by Jim Pingle over 3 years ago

  • Plus Target Version changed from 21.09 to 22.01

Moving ahead, still needs more thought/planning about how best to approach this

Actions #7

Updated by Jim Pingle about 3 years ago

  • Target version changed from 2.6.0 to CE-Next
  • Plus Target Version changed from 22.01 to 22.05
Actions #8

Updated by Jim Pingle almost 3 years ago

  • Assignee set to Jim Pingle
  • Target version changed from CE-Next to 2.7.0
Actions #10

Updated by Jim Pingle almost 3 years ago

  • Status changed from New to In Progress

Adding basic functions here is pretty straightforward. It's easy enough to add a means to kill states created by a rule, though it's a little counterintuitive.

Killing by tracker ID will kill the states created by the rule with that ID, which is expected, but there will nearly always be another state as the connection exits the firewall and there isn't a way to associate that and kill it, too. But killing the one may be good enough for now.

Same story for killing states created by policy routing rules using a given gateway and group. We can find and kill the states created by the rules with the gateway/group set on them but not the egress states. This may be good enough, though, since we can kill the egress states without much trouble as is (e.g. kill states on WANX when WANX goes down, then kill any states created by GW_WANX).

I'll have some test code for this soon, at least for the manual state killing parts.

Actions #11

Updated by Jim Pingle almost 3 years ago

The more I consider how this might work the less sure I am that the gateway part would be useful in a way most users would expect. Users would expect that it would kill any state using the gateway, even gateway groups, but it wouldn't be that precise. If a rule uses a gateway group it would have to kill any state using that group, not just states hitting that rule using a specific gateway inside the group. Granted that's still better than killing all states everywhere on gateway failure, but it may require extra clarification in the GUI and/or docs.

Actions #12

Updated by Jim Pingle almost 3 years ago

Kristof let me know that we do also have pfctl -k gateway -k x.x.x.x which would fill the missing pieces in here. It's not in the man page or command help so I missed that it was available again.

Actions #13

Updated by Jim Pingle almost 3 years ago

  • Subject changed from Utilize new ``pfctl`` ability to kill states by label to Utilize new ``pfctl`` abilities to kill states

Updating subject as this has evolved a bit to encompass both killing by label for rule IDs and killing by gateway.

Actions #14

Updated by Jim Pingle almost 3 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 100
Actions #15

Updated by Jim Pingle almost 3 years ago

These changes will be available in snapshots soon. It grew a little bit since the initial description but it ended up better overall as there were problems with the original approach that are mostly solved by the different approach I ended up using.

  • Added action on firewall rule list to kill states on an interface created by a specific rule (from firewall rule list)
  • Added action on gateway status page to kill states created by policy routing rules using a specific gateway name (from gateway status page)
  • Added action on gateway status page to kill states using the default gateway (0.0.0.0 or ::) -- these options match states from rules that DO NOT use policy routing or reply-to.
  • Added action on gateway status and gateway group page to kill states by gateway IP address (catches route-to/policy routing and reply-to, both inbound and outbound)
  • Added action on gateway group status page to kill states created by policy routing rules using a specific gateway group name (catching anything that hits rules without route-to)
  • Change global state killing option to be granular (none, all down, flush all)
  • Add per-gateway option to override global behavior (use default, do not kill, kill when down)
  • Improve logic when determining which gateways are considered in state killing behavior.
  • Log action when killing states
  • Upgrade code to convert old setting to new format

I started a forum thread with additional information and for feedback: https://forum.netgate.com/topic/170690/new-state-killing-mechanisms-12092

Actions #16

Updated by Jim Pingle almost 3 years ago

  • Related to Feature #12931: Retain knowledge of previous dynamic gateway IP address when interface is down added
Actions #17

Updated by Jim Pingle almost 3 years ago

  • Related to Bug #8555: Selectively killing states on WAN failure added
Actions #18

Updated by Jim Pingle almost 3 years ago

  • Related to Feature #855: Ability to selectively kill states on gateway recovery added
Actions #19

Updated by Jim Pingle almost 3 years ago

  • Related to Bug #12942: Code to kill states for old gateway when reconnecting an interface is incorrect added
Actions #20

Updated by Jim Pingle over 2 years ago

  • Status changed from Feedback to Closed

This has been working well for a while now. Any issues we hit from here can be addressed separately.

Actions #21

Updated by Marcos M almost 2 years ago

  • Related to Bug #13934: Killing states by gateway can miss some IPv6 outbound states added
Actions

Also available in: Atom PDF