Project

General

Profile

Bug #9296

Rule / Alias FQDN-Resolution broken

Added by Ph. T 7 months ago. Updated 11 days ago.

Status:
New
Priority:
High
Assignee:
Category:
Rules/NAT
Target version:
Start date:
01/30/2019
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4_2
Affected Architecture:
All

Description

If you are using FQDN-Aliases each FQDN can only be used once, if
you use the alias twice, the generated tables are incomplete.

No DNS-Server/Resolver on the firewall is used. External DNS
resolvers are configured.

Example:
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4, fqdn2, fqdn3

Generated tables are incomplete
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4 (the others are missing)

alias2 does only contain fqdn4 and fqdn2 and fqdn2 are missing.

This bug seems to arise with 2.4.4_p1 and is still existing in 2.4.4_p2;
I am not sure if this behavior is present within 2.4.4.

I am working on a minimal example which i will provide.

Rule_Set.PNG (37.4 KB) Rule_Set.PNG Ruleset Ph. T, 01/31/2019 05:29 AM
Alias_Configuration.PNG (24.2 KB) Alias_Configuration.PNG Alias-Configuration Ph. T, 01/31/2019 05:29 AM
table_fqdn1.PNG (35.8 KB) table_fqdn1.PNG table fqdn1 Ph. T, 01/31/2019 05:29 AM
table_fqdn2.PNG (49.6 KB) table_fqdn2.PNG table fqdn2 Ph. T, 01/31/2019 05:29 AM

History

#1 Updated by Ph. T 7 months ago

I have now prepared a minimal example:

As you can see fqdn1 is missing the entry for one.one.one.one

Please FIX

#2 Updated by Eduard Rozenberg 7 months ago

I believe my issues may be related to this. We updated to 2.4.4 p2 on Jan 9, but only in the past few days have seen the problems.

The firewalls and sites at several locations are configured to allow remote access based on firewall aliases and rules using those aliases. The aliases sometimes contain a mix of IP addresses (/32), IP networks (/29 for ex), and DNS names (something.company.com).

Since the past few days, the pfSense firewalls at the various sites all reject my remote connection attempts, and the rejections are visible in the firewall logs.

It's...a problem.

#3 Updated by Jim Pingle 7 months ago

  • Category set to Rules/NAT
  • Assignee set to Luiz Souza
  • Target version set to 48
  • Affected Architecture set to All

#4 Updated by Robert Gijsen 6 months ago

2.4.4-RELEASE-p2, I've had this multiple times. At the moment I can even sort of reproduce it.
When adding hosts to an alias my AD DNS server logs:

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine result gets send back to pfSense. However the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well.

pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one in the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: www.ict-net.nl

The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.

I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work. Then I added www.tweakers.net as another try, and that one gets in there immediately.
I again killed filterdns, restarted it and poof - the tables immediately got filled as they should. So it seems filterdns is partially functional - some hosts get added, some aren't. It could indeed be when hosts already exist in the table somewhere; however restarting filterdns at least populates them for a while.

Tell me what loggings you need. As it seems I can now reproduce this at will (also on my second carp / HA node by the way) I can probably give all needed logs.

#5 Updated by Robert Gijsen 6 months ago

I've just downgraded a test-machine to 2.4.4 release, and that works fine. Keeping it there for a while.

#6 Updated by Eduard Rozenberg 6 months ago

Shortly after I posted my problem above 20 days ago, it started working again on its own.

Then today, it is again not working.

So it may be a sporadic issue with the alias resolution, that doesn't happen consistently. Have not been able to pin down the issue at all.

#7 Updated by Eduard Rozenberg 6 months ago

I can confirm my issue is the same as described by the other posters on this bug.

Logs show that filterdns claims to be doing the right thing - all expected alias entries (FQDN's, IP's, networks) show up in:
$ clog /var/log/resolver.log | grep "Adding Action"

But the alias table is incomplete, some IP addresses are missing:
$ pfctl -T show -t my_alias_name

There is no DNS resolution issue with any of the FQDN's - if I ping the FQDN's from the firewall their IP addresses are resolved.

Restarting the filter, re-saving the alias does not help.

#8 Updated by Eduard Rozenberg 6 months ago

I've also ruled out some other possibilities below -

Not the issue:
https://docs.netgate.com/pfsense/en/latest/firewall/thread-error-using-many-hostname-in-aliases.html
(I don't have a threads error in logs, and setting this tunable did not help)

Not the issue:
Mixing FQDN's and IP's - I tried creating a new alias with only a single FQDN from the ones that don't work in the original alias. Still no luck.

#9 Updated by Jim Pingle 5 months ago

  • Target version changed from 48 to 2.5.0

#10 Updated by Azamat Khakimyanov 3 months ago

I see this behavior on 2.4.4_p2, on 2.4.5-dev and on 2.5.0-dev.
As workaround we can:
- in console run 'pkill filterdns' command
- then /Status/Filter Reload to start 'filterdns' service

#11 Updated by Gavin Stewart 3 months ago

As a workaround I have installed the Cron package with the following additional entries:

*/15 * * * * root killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1     
@reboot      root sleep 10; killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1

#12 Updated by Robert Gijsen 3 months ago

I know it's targeted for 2.5.0, but still I'd like to inform people here that 2.4.4_3 does indeed NOT fix this, making it yet another update that kills main functionality. Be aware, and be extremely reluctant to update!

#13 Updated by Rudolf Mayerhofer 2 months ago

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

#14 Updated by Christoforos Tsoukaris 2 months ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

#15 Updated by Gavin Stewart about 2 months ago

Christoforos Tsoukaris wrote:

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

If you look at the cron entries I have mentioned earlier (#9296#note-11), you will see that I have the interval (-i) set to 300. I am still seeing missing entries in the alias tables on occasion, which do get corrected when cron kills and restarts filterdns within the next 15 mins.

#16 Updated by Mark Monaghan about 2 months ago

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes, eventually causing the firewall to trigger CARP/HA, give high latency to VPN and internet traffic, and eventually cause the firewall to stop passing traffic altogether. I went for installing the cron GUI package (But you could just as easily edit /etc/crontab directly), and I've changed the lines to:

*/15 * * * *  root    /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1
@reboot       root    sleep 10; /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1

I've checked that these cron jobs now correctly kill the old processes before starting new ones, and that the filterdns process is correctly doing it's job as long as the restarts are in place (As noted in other notes to this bug, it still stops resolving after a while, although I've not had time to monitor the firewalls to find out exactly how long it is before the process stops, so apologies for that).

#17 Updated by Gavin Stewart about 2 months ago

Mark Monaghan wrote:

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes,

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

#18 Updated by Mark Monaghan about 1 month ago

Gavin Stewart wrote:

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

No, sorry, I copied and pasted the commands verbatim from here to ensure that I didn't make any errors when implementing them. The -9 was definitely in there. What I was finding was that if I ran them individually, or even as a grouped command set, from the CLI, they worked perfectly, but they failed to kill any processes when ran via the cron. This was all done and tested on 2.4.4-p3. I cannot comment on how this performed prior to this version, as it wasn't implemented on 2.4.4-p2 or lower, only after the system was upgraded to run on the latest stable version.

This is the reason I switched the cron job to pkill as nothing I tried would get killall or even kill to terminate the filterdns process via the cron, but pkill was working for reasons only known to the OS. However, that also presented it's own challenges as unless I used the exact filter of -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1" rather than pkill filterdns or pkill -f "filterdns" it was seeing filterdns in it's own cron job and killing itself before it killed any filterdns processes, so until I put the large filter in place, I wasn't any further forward and still had filterdns processes stacking up, causing the system to fall over eventually (I put this down to the system running out of open file handles as it certainly never ran out of memory before the crashes that I experienced started to happen.

#19 Updated by Rudolf Mayerhofer about 1 month ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

#20 Updated by Eduard Rozenberg 28 days ago

Rudolf Mayerhofer wrote:

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

Unfortunately this doesn't solve my problem. Same issues, regardless of the refresh value. Continues to make life difficult on a regular basis.

#21 Updated by Peter van der Kleij 25 days ago

I think I have a similar problem.
My inbound rule did not work with an FQDN in the Alias. (Whitelist for source addresses) Weird thing was that only one ip-address in the Alias (not the FQDN) did not work, restarting servers/pfsense and such did not give any result.

When I enable the 300 sec for 'Aliases Hostname Resolve interval', it WORKS, when i leave it empty, it FAILS directly.

2.4.4-RELEASE-p3, FreeBSD 11.2-RELEASE-p10

#22 Updated by Art Manion 17 days ago

Netgate SG-4860 running 2.4.4-RELEASE-p3 (amd64). At least twice I've experienced issues, I assume involving filterdns, where aliases were updated via the GUI but did not make it into the pf table (and I believe the updates didn't land in /var/etc/dnsfilter.conf either, but I can't confirm).

The alias/table is 58 entries, mostly DNS names, some IP addresses, and some other pfSense aliases. The last time this problem happened neither DNS name, IP address, nor pfSense aliases were added to the pf table. filterdns was running. killall -9 filterdns && /etc/rc.filter_configure fixed it.

#23 Updated by Justin J 13 days ago

Also experiencing this issue on 2.4.4-p2 and now 2.4.4-p3. If FQDNs are remove the table updates correctly. Due to the way many cloud service provider clusters move and reallocate IPs it is impractical to use individual host or CIDR lists as this requires constant updating once a change is made on the remote server. In my case I first noted the issue after one entry was updated and became a duplicate IP in a CIDR that was in the table. I've tried killing and restarting filterdns and changing resolution time to 30 seconds but with no improvement. Sometimes the table would partially generate, other times it was completely empty.
Can this be escalated for a resolution before 2.5 as it breaks the core firewall functionality for those of us using a cloud or hybrid cloud setup?

#24 Updated by Robert Gijsen 13 days ago

I second Justins message / question. pfSense is completely unusable after 2.4.4 initial release. With filterdns not working properly, it fails as a firewall completely. That leaves us with the choice of updating pfSense to a completely useless not working version, or leave it at 2.4.4 and leave it vulnerable to now known security issues. Both are unacceptable.

This should certainly get higher priority.

#25 Updated by Tom Hebert 13 days ago

Most of you are more experienced at this than me so please be tolerant if this is a dumb question.

I added a Firewall Alias containing a single FQDN, xxx.mydomain.com. A Diagnostics=>DNS Lookup for xxx.mydomain.com returns three addresses, one IPv4 and two IPv6 (received from DHCPv6 and RA). However when I inspect the alias's table using Diagnostics=>Tables, only the IPv4 address is listed. It follows that the rule referencing the alias isn't passing traffic to the IPv6 addresses. I would expect to see all three addresses in the table. Is that expectation correct or am I missing something?

For background, I am using resolver and there is a domain override in place for mydomain.com (TypeTransparent).

Am I experiencing this bug?

#26 Updated by Justin J 12 days ago

That sounds like it might be something else Tom. Check your output from the CLI with: pfctl -T show -t ALIASNAME
If it's not there try making a forum post as the discussion here should be directly about the bug, not diagnosing the possibility of your setup having it.

#27 Updated by Tom Hebert 11 days ago

Justin J: I took your advice and posted on the forum and was promptly referred back here. Here's the link in case you are inclined to read it. https://forum.netgate.com/topic/145553/firewall-alias-not-updating-table-correctly

Also available in: Atom PDF