Project

General

Profile

Actions

Bug #7209

closed

Something is seriously wrong with firewall aliases

Added by Dmitry Kernel almost 8 years ago. Updated almost 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Web Interface
Target version:
-
Start date:
02/04/2017
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3.2
Affected Architecture:
amd64

Description

pfS version is 2.3.2-p1.

Unbound host overrides used in FW aliases:
- server.home 192.168.201.1
-- nas.home alias
-- cloud.home alias
-- code.dev.home alias
irrelevant aliases omitted

Relevant firewall aliases
- NAS Host nas.home
- cloud Host cloud.home

1. Edited [cloud] alias previously pointed to "cloud.server.home" which not exists anymore, changed to cloud.home, applied. Later get denied access to cloud, checked my rules - correct, hovewer

pfctl -t cloud -T show
EMPTY

[NAS] alias was recently edited prior to this for the same reason with no issues.

2. Edited [cloud], changed to "nas.home"

pfctl -t cloud -T show
192.168.201.1

As expected

3. Changed back to "cloud.home", applied

No visible change in pfctl output, IP correct

4. Changed to 192.168.201.2 IP, applied

pfctl -t cloud -T show
192.168.201.1
192.168.201.2

Old IP still there? WTF?! Potential hole which could stay long-unnoticed
Ofc, alias still meant to be single-host, and in UI only 192.168.201.2 is listed

5. Changed to "code.dev.home"

pfctl -t cloud -T show
192.168.201.2

Should be 192.168.201.1

6. Changed to "server.home"

pfctl -t cloud -T show
192.168.201.1
192.168.201.2

Same crap as in 4

7. Changed to 192.168.201.1

pfctl -t cloud -T show
EMPTY

8. Added alias description

pfctl -t cloud -T show
192.168.201.1

Finally correct

9. Created [test] alias, set it to "cloud.home"

pfctl -t test -T show
192.168.201.1

As expected

10. Edited [test], changed to IP 192.168.201.3, applied

pfctl -t test -T show
pfctl: Table does not exist.

However in UI it does exist

11. Changed to "code.dev.home"

pfctl -t test -T show
192.168.201.1

As expected

12. Changed to IP (tried different addresses in the same subnet)

pfctl -t test -T show
pfctl: Table does not exist.

Something is definitely broken here

Additional info
I have in total 5 FW aliases (6 counting [test]) defined in UI, 2 of them are Networks, 1 multi-host, and 2 mentioned here are single-host.
Of course, every change I made was applied before checking results.

Please check and fix this ASAP as this is potential security hole


Related issues

Related to Bug #12708: Alias with non-resolving FQDN entry breaks underlying PF tableResolvedReid Linnemann

Actions
Actions #1

Updated by Jim Pingle almost 8 years ago

  • Status changed from New to Rejected
  • Priority changed from Very High to Normal

I can't reproduce this, it's possible it's a side effect of something else in your configuration. Even on 2.3.2_1 every change I make to the alias is reflected in the table contents as soon as the entry is applied, whether I add hosts, change hosts, etc. It's never empty and never contains outdated information.

Try it again on 2.4, it's possible you're hitting some other edge case we've already fixed as a part of a similar issue such as #6982

Actions #2

Updated by Dmitry Kernel almost 8 years ago

What? Rejected based on assumptions that it possibly just me screwed up something on my instance or that it possibly already been adreesed in dev branch?
Quite unexpected behaviour considering the nature of the bug and all that "most secure firewall" claims

This case is not some sort of harmless issue to get away with assumptions, this is potential hole in FW and must be thoroughly investigated until absolutely certain about root cause

Ok, lazy dev, I did the debugging for you.

WebGUI part is fine, config is written out correctly. Actual issues lay in PF reloading script and FilterDNS.

1. When alias contains only FQDNs, table def in generated rules looks like "table <alias> persist", and in this case old entries are NOT flushed during pfctl rules reload.

$after_filter_configure_run[] = "/sbin/pfctl -T flush -t " . escapeshellarg($aliased['name']);
in filter_generate_aliases() is added only if !empty($aliased['address']), which is not the case here.

Even if this is a bug with $after_filter_configure_run, and flush cmd is intended to be added for execution, there would be race condition possible between filterDNS reloading and statements from $after_filter_configure being executed, as they are executed after filterDNS already been HUP'ed/started

man pf.conf(5) says:
The contents of a pre-existing table defined without a list of addresses to initialize it is not altered when pf.conf is loaded.
A table initialized with the empty list, { }, will be cleared on load.

Test case: 192.168.201.4 -> cloud.home
pfctl stmt: table <cloud> persist
$after_filter_configure_run: empty
Actual table contents be4 reload: 192.168.201.4
rules reload successful, $rules_loading==0, $rules_error empty [exec("/sbin/pfctl -o basic -f {$g['tmp_path']}/rules.debug 2>&1", $rules_error, $rules_loading);]
Actual table contents right after reload: 192.168.201.4
FilterDNS HUP-ed [sigkillbypid("{$g['varrun_path']}/filterdns.pid", "HUP");]
FCGI call to /etc/rc.filter_configure_sync done by check_reload_status daemon completed
SHELL# pfctl -t cloud -T show: 192.168.201.1, 192.168.201.4

192.168.201.4 is still here, an obvious bug.
And I don't see any commits fixing it in git repo, am I looking in the wrong place?

2. Second issue is FilterDNS sometimes failing to add its entries into table. They're not added even on subsequent manual HUPs, until FQDN is changed.
Could lead to empty table in cases:
- filterDNS removed entry for old FQDN, but failed to add new
- when changing FQDN to its IP, IP entry gets removed by filterDNS, and there is nothing to be added in place of it

Is it all already adressed in dev branch?

3. Third issue is fragile arhitecture based on assumptions. FilterDNS is HUP-ed/restarted-if-dead, but not waited for, and actual table contents are not checked - plain blind assumption.
As such tasks are async anyway, filter_configure_sync() must wait some reasonable amount of time and re-check table contents, retrying or reporting an error in case of mismatch.
Ideally it should be event-driven with filterDNS reporting back when done its job

As to my setup - it has nothing special and overall is pretty basic: only 2 irrelevant plugins (iperf & arping), no write_config hooks, no /usr/local/pkg/pf/ plugins.
Never touched configs/scripts by hand until now (except nmbclusters related sysctls)

And here goes my guesses for cases from original report:

1. cloud.server.home (inexistent) -> cloud.home (both pointed/pointing to the same 192.168.201.1). Result - empty table
Probably FilterDNS issue, it first removed entry for old DN, but then failed to add new one

4. cloud.home -> 192.168.201.2 Result: 192.168.201.1, 192.168.201.2
??? Now alias is plain IP, table should be completely re-init'ed during reload and contain only 192.168.201.2 right after it
May be late FilterDNS action? Or stalled entries in its config? Couldn't reproduce so far, will do more tests later

5. 192.168.201.2 (actual contents 192.168.201.1, 192.168.201.2) -> code.dev.home. Result - 192.168.201.2
192.168.201.2 not cleared out by "table <cloud> persist", fDNS probably removed 192.168.201.1, but then failed to add entry with same IP for code.dev.home

6. code.dev.home -> server.home. Result: 192.168.201.1, 192.168.201.2
192.168.201.2 is stalled entry from prev cases still not cleared out during rules reload, 192.168.201.1 correctly added by filterDNS

7. server.home -> 192.168.201.1 Result - empty table
Stalled entry cleared out by "table <cloud> {ip_here}"
Correct entry for 192.168.201.1 removed by fDNS (as server.home resolves to that IP) resulting in empty table

10 & 12. Alias [test] pointing to IP. Result: Table does not exist.
May be due to lacking "persist"? This alias wasn't used in rules, and so was deleted by kernel
Not a big issue in this case, but inconsistent with tables containing FQDNs

So the bug is here and is not imaginary.
Reopen this issue please, until you can confirm all mentioned issues were already adressed and fixed in dev branch.

Actions #3

Updated by Jim Pingle almost 8 years ago

Did you try it on 2.4 as requested?

Either you are leaving out a config state/step or it's not an issue we can reproduce here in a lab setup. This is the kind of problem you need to describe in detail on the forum for discussion before unilaterally declaring it as a bug. I tried making all of your stated changes three times and never had a problem in any state. The tables always contained the exact entries expected, no more, no less.

If someone can demonstrate that there is an issue on 2.4 that can be reproduced, with the exact steps involved in reproducing it, not half the info, then we can look deeper.

And please watch the attitude, getting upset about an invalid bug report and calling developers "lazy" is not going to help anyone, and does not make your case stronger.

Actions #4

Updated by Dmitry Kernel almost 8 years ago

No, I haven't tried it on 2.4 yet, however I digged into sources on 2.3.2_p1 and reproduced it step-by-step while collecting debugging info, which is a lot better approach.

"Couldn't reproduce on 2.4" is not a 100% guarantee that the issue been fixed, it may be just "masked out" by some partially-relevant changes, but still not fixed at its root, and so still may reveal itself at later point.
First, the root cause of issue must be found on system where issue is reproducible, and only then it would be possible to say for sure, was it already fixed or not.

I found that root cause and gave it to you in my last post, I gave you exact description of what happens, where it happens and why it happens. With references to source code.
Should you need, I could give you exact line numbers where buggy code parts reside. I'm almost sure my config is irrelevant here, however I could give you parts of xml which are of your interest, no problem, just ask.

As to my attitude, my issue was not just considered invalid, it was considered so without even asking for more info and/or asking for more in-depth tests.
That is what I'm doing myself while being unable to reproduce some issue reported on some of my softwares, and that is what I expect from others when reporting more-or-less serious issues.
Personally, I don't really care what you will do with this report. But your current behaviour makes pfSense itself, which I really love, no-more-trusted FW platform for me, and I definitely don't like this transition.
This is what upsetting me, you should show more love and attention to your own software. Aren't you interested to make it as rock-solid as possible, and even beyond?

Even now, after I posted exact description of what, where and why is going wrong, with references to relevant sources/function names, you telling me to go and post on forums.
May I ask, are you a developer or some sort of management person?

Did you try replacing plain IP in single-host alias to some FQDN (pointing to another IP address) in your tests? This case is 100% reproducible on 2.3.2_p1 and its cause is pf table definition in generated pf-rules being "table <alias> persist" with no table-flushing statements following it. In that case PF will leave existing table and its contents intact, and filterDNS a bit later will just add its own entry for resolved FQDN, resulting in 2 entries in table.

Here are the steps

1. Create new alias of type Host for single plain IPv4 address, save
2. Create FW rule using that alias, so it will be persisted by kernel
3. Apply all changes
4. Check in console with "pfctl -t your_alias_name -T show", it should be correct
5. Edit that alias and replace IP with some FQDN resolving to different IP (to exactly follow my test, that FQDN should be unbound (dns resolver) Host Override, but that should be irrelevant)
6. Save and apply changes
7. Check the actual table again in console, now it will contain either both old & new addresses, or just old one

Actions #5

Updated by Dmitry Kernel almost 8 years ago

2.4-latest as of today, fresh install - confirmed NOT fixed. Just as I already said in one of my prev posts

And I don't see any commits fixing it in git repo, am I looking in the wrong place?

Stalled entries (IP -> FQDN for different IP) - consistently reproducible, more than 5 times in a row, 100% positive

FQDN -> one of its IP -> empty table - consistently reproducible, more than 5 time in a row, 100% positive

Single IP FQDN (unbound alias, in /etc/hosts) -> another same-IP FQDN (also in /etc/hosts) -> relevant entry missing in table / table empty - sometimes reproducible, got it at least once

Actions #6

Updated by Stuart Wyatt about 7 years ago

I think I'm seeing the same problem.

I had an alias that wouldn't update. It is an alias made up of a list of other aliases. One of those aliases had a FQDN, which could no longer be resolved. That caused the aliases/tables using that alias to not update. The one with the FQDN was never used directly, so no table was ever created for it, making it hard to see the failure.

For me, the bug is that it fails silently. I couldn't find any logging to tell me that this alias/table wasn't updated.

Let me know if this should be a separate bug.

Actions #7

Updated by → luckman212 over 5 years ago

I just hit this bug today on a fully updated 2.4.4-p3 firewall.

There was an IP Alias named "h_whitelist" containing just a single FQDN entry. I wanted to add a placeholder IP (2nd entry) to this alias, so I clicked Add and input 1.2.3.4. Clicked Save and Apply. The h_whitelist table did not get updated correctly, and was now empty. Checking /tmp/rules.debug it contained the following:

table <h_whitelist> persist
h_whitelist = "<h_whitelist>" 

So, an empty Table / broken firewall rule. I tried Status > Filter Reload but that did not help. What "fixed" it for me was editing the alias again, deleting the 1.2.3.4 entry, save & apply, then edit again, adding it back, save & apply. Not sure really what is going on here but there's definitely still a broken code path somewhere.

Actions #8

Updated by Dennis Neuhaeuser about 4 years ago

I think I probably had this issue today on a 2.4.4-p3 firewall.

I had an alias containing one FQDN (in first row) and about 30 single IPs.
This was working for months.
Suddenly after a reboot 5 of these IPs were missing in the filter table. (checked with "pfctl -t ALIAS -T show")

Filter Reload and Re-Save/Re-Apply of alias did not change anything.

Then I realized that the FQDN was pointing to one of the IP also listed (so it was redundant).
My fix was to remove the FQDN line from alias, save and apply.

Actions #9

Updated by Chris Tsou almost 4 years ago

I can confirm that I have the same issue on 2.4.4-RELEASE-p1. please reopen this.

Actions #10

Updated by Viktor Gurov almost 4 years ago

see #9296

Actions #11

Updated by Stuart Wyatt almost 4 years ago

This bug / #9296 was easily reproducible 3 years ago when I first hit it and still is today on 2.4.5-p1. Just make an alias with a unresolvable FQDN and an IP. Do some edits and the table is still missing entries. I'll let a Netgate dev have access to mine so they can look at a currently broken alias.

Actions #12

Updated by Viktor Gurov almost 3 years ago

  • Related to Bug #12708: Alias with non-resolving FQDN entry breaks underlying PF table added
Actions

Also available in: Atom PDF