What? Rejected based on assumptions that it possibly just me screwed up something on my instance or that it possibly already been adreesed in dev branch?
Quite unexpected behaviour considering the nature of the bug and all that "most secure firewall" claims
This case is not some sort of harmless issue to get away with assumptions, this is potential hole in FW and must be thoroughly investigated until absolutely certain about root cause
Ok, lazy dev, I did the debugging for you.
WebGUI part is fine, config is written out correctly. Actual issues lay in PF reloading script and FilterDNS.
1. When alias contains only FQDNs, table def in generated rules looks like "table <alias> persist", and in this case old entries are NOT flushed during pfctl rules reload.
$after_filter_configure_run[] = "/sbin/pfctl -T flush -t " . escapeshellarg($aliased['name']);
in filter_generate_aliases() is added only if !empty($aliased['address']), which is not the case here.
Even if this is a bug with $after_filter_configure_run, and flush cmd is intended to be added for execution, there would be race condition possible between filterDNS reloading and statements from $after_filter_configure being executed, as they are executed after filterDNS already been HUP'ed/started
man pf.conf(5) says:
The contents of a pre-existing table defined without a list of addresses to initialize it is not altered when pf.conf is loaded.
A table initialized with the empty list, { }, will be cleared on load.
Test case: 192.168.201.4 -> cloud.home
pfctl stmt: table <cloud> persist
$after_filter_configure_run: empty
Actual table contents be4 reload: 192.168.201.4
rules reload successful, $rules_loading==0, $rules_error empty [exec("/sbin/pfctl -o basic -f {$g['tmp_path']}/rules.debug 2>&1", $rules_error, $rules_loading);]
Actual table contents right after reload: 192.168.201.4
FilterDNS HUP-ed [sigkillbypid("{$g['varrun_path']}/filterdns.pid", "HUP");]
FCGI call to /etc/rc.filter_configure_sync done by check_reload_status daemon completed
SHELL# pfctl -t cloud -T show: 192.168.201.1, 192.168.201.4
192.168.201.4 is still here, an obvious bug.
And I don't see any commits fixing it in git repo, am I looking in the wrong place?
2. Second issue is FilterDNS sometimes failing to add its entries into table. They're not added even on subsequent manual HUPs, until FQDN is changed.
Could lead to empty table in cases:
- filterDNS removed entry for old FQDN, but failed to add new
- when changing FQDN to its IP, IP entry gets removed by filterDNS, and there is nothing to be added in place of it
Is it all already adressed in dev branch?
3. Third issue is fragile arhitecture based on assumptions. FilterDNS is HUP-ed/restarted-if-dead, but not waited for, and actual table contents are not checked - plain blind assumption.
As such tasks are async anyway, filter_configure_sync() must wait some reasonable amount of time and re-check table contents, retrying or reporting an error in case of mismatch.
Ideally it should be event-driven with filterDNS reporting back when done its job
As to my setup - it has nothing special and overall is pretty basic: only 2 irrelevant plugins (iperf & arping), no write_config hooks, no /usr/local/pkg/pf/ plugins.
Never touched configs/scripts by hand until now (except nmbclusters related sysctls)
And here goes my guesses for cases from original report:
1. cloud.server.home (inexistent) -> cloud.home (both pointed/pointing to the same 192.168.201.1). Result - empty table
Probably FilterDNS issue, it first removed entry for old DN, but then failed to add new one
4. cloud.home -> 192.168.201.2 Result: 192.168.201.1, 192.168.201.2
??? Now alias is plain IP, table should be completely re-init'ed during reload and contain only 192.168.201.2 right after it
May be late FilterDNS action? Or stalled entries in its config? Couldn't reproduce so far, will do more tests later
5. 192.168.201.2 (actual contents 192.168.201.1, 192.168.201.2) -> code.dev.home. Result - 192.168.201.2
192.168.201.2 not cleared out by "table <cloud> persist", fDNS probably removed 192.168.201.1, but then failed to add entry with same IP for code.dev.home
6. code.dev.home -> server.home. Result: 192.168.201.1, 192.168.201.2
192.168.201.2 is stalled entry from prev cases still not cleared out during rules reload, 192.168.201.1 correctly added by filterDNS
7. server.home -> 192.168.201.1 Result - empty table
Stalled entry cleared out by "table <cloud> {ip_here}"
Correct entry for 192.168.201.1 removed by fDNS (as server.home resolves to that IP) resulting in empty table
10 & 12. Alias [test] pointing to IP. Result: Table does not exist.
May be due to lacking "persist"? This alias wasn't used in rules, and so was deleted by kernel
Not a big issue in this case, but inconsistent with tables containing FQDNs
So the bug is here and is not imaginary.
Reopen this issue please, until you can confirm all mentioned issues were already adressed and fixed in dev branch.