Project

General

Profile

Bug #9296

Rule / Alias FQDN-Resolution broken

Added by Ph. T 9 months ago. Updated 6 days ago.

Status:
New
Priority:
High
Assignee:
Category:
Aliases / Tables
Target version:
Start date:
01/30/2019
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4_2
Affected Architecture:
All

Description

If you are using FQDN-Aliases each FQDN can only be used once, if
you use the alias twice, the generated tables are incomplete.

No DNS-Server/Resolver on the firewall is used. External DNS
resolvers are configured.

Example:
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4, fqdn2, fqdn3

Generated tables are incomplete
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4 (the others are missing)

alias2 does only contain fqdn4 and fqdn2 and fqdn2 are missing.

This bug seems to arise with 2.4.4_p1 and is still existing in 2.4.4_p2;
I am not sure if this behavior is present within 2.4.4.

I am working on a minimal example which i will provide.

Rule_Set.PNG (37.4 KB) Rule_Set.PNG Ruleset Ph. T, 01/31/2019 05:29 AM
Alias_Configuration.PNG (24.2 KB) Alias_Configuration.PNG Alias-Configuration Ph. T, 01/31/2019 05:29 AM
table_fqdn1.PNG (35.8 KB) table_fqdn1.PNG table fqdn1 Ph. T, 01/31/2019 05:29 AM
table_fqdn2.PNG (49.6 KB) table_fqdn2.PNG table fqdn2 Ph. T, 01/31/2019 05:29 AM
191011_Tnk_config-pfSense.localdomain-20191011143458.xml (15.9 KB) 191011_Tnk_config-pfSense.localdomain-20191011143458.xml Ph. T, 10/11/2019 09:40 AM

History

#1 Updated by Ph. T 9 months ago

I have now prepared a minimal example:

As you can see fqdn1 is missing the entry for one.one.one.one

Please FIX

#2 Updated by Eduard Rozenberg 9 months ago

I believe my issues may be related to this. We updated to 2.4.4 p2 on Jan 9, but only in the past few days have seen the problems.

The firewalls and sites at several locations are configured to allow remote access based on firewall aliases and rules using those aliases. The aliases sometimes contain a mix of IP addresses (/32), IP networks (/29 for ex), and DNS names (something.company.com).

Since the past few days, the pfSense firewalls at the various sites all reject my remote connection attempts, and the rejections are visible in the firewall logs.

It's...a problem.

#3 Updated by Jim Pingle 9 months ago

  • Category set to Rules / NAT
  • Assignee set to Luiz Souza
  • Target version set to 48
  • Affected Architecture set to All

#4 Updated by Robert Gijsen 8 months ago

2.4.4-RELEASE-p2, I've had this multiple times. At the moment I can even sort of reproduce it.
When adding hosts to an alias my AD DNS server logs:

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine result gets send back to pfSense. However the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well.

pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one in the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: www.ict-net.nl

The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.

I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work. Then I added www.tweakers.net as another try, and that one gets in there immediately.
I again killed filterdns, restarted it and poof - the tables immediately got filled as they should. So it seems filterdns is partially functional - some hosts get added, some aren't. It could indeed be when hosts already exist in the table somewhere; however restarting filterdns at least populates them for a while.

Tell me what loggings you need. As it seems I can now reproduce this at will (also on my second carp / HA node by the way) I can probably give all needed logs.

#5 Updated by Robert Gijsen 8 months ago

I've just downgraded a test-machine to 2.4.4 release, and that works fine. Keeping it there for a while.

#6 Updated by Eduard Rozenberg 8 months ago

Shortly after I posted my problem above 20 days ago, it started working again on its own.

Then today, it is again not working.

So it may be a sporadic issue with the alias resolution, that doesn't happen consistently. Have not been able to pin down the issue at all.

#7 Updated by Eduard Rozenberg 8 months ago

I can confirm my issue is the same as described by the other posters on this bug.

Logs show that filterdns claims to be doing the right thing - all expected alias entries (FQDN's, IP's, networks) show up in:
$ clog /var/log/resolver.log | grep "Adding Action"

But the alias table is incomplete, some IP addresses are missing:
$ pfctl -T show -t my_alias_name

There is no DNS resolution issue with any of the FQDN's - if I ping the FQDN's from the firewall their IP addresses are resolved.

Restarting the filter, re-saving the alias does not help.

#8 Updated by Eduard Rozenberg 8 months ago

I've also ruled out some other possibilities below -

Not the issue:
https://docs.netgate.com/pfsense/en/latest/firewall/thread-error-using-many-hostname-in-aliases.html
(I don't have a threads error in logs, and setting this tunable did not help)

Not the issue:
Mixing FQDN's and IP's - I tried creating a new alias with only a single FQDN from the ones that don't work in the original alias. Still no luck.

#9 Updated by Jim Pingle 7 months ago

  • Target version changed from 48 to 2.5.0

#10 Updated by Azamat Khakimyanov 5 months ago

I see this behavior on 2.4.4_p2, on 2.4.5-dev and on 2.5.0-dev.
As workaround we can:
- in console run 'pkill filterdns' command
- then /Status/Filter Reload to start 'filterdns' service

#11 Updated by Gavin Stewart 5 months ago

As a workaround I have installed the Cron package with the following additional entries:

*/15 * * * * root killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1     
@reboot      root sleep 10; killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1

#12 Updated by Robert Gijsen 5 months ago

I know it's targeted for 2.5.0, but still I'd like to inform people here that 2.4.4_3 does indeed NOT fix this, making it yet another update that kills main functionality. Be aware, and be extremely reluctant to update!

#13 Updated by Rudolf Mayerhofer 4 months ago

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

#14 Updated by Christoforos Tsoukaris 4 months ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

#15 Updated by Gavin Stewart 4 months ago

Christoforos Tsoukaris wrote:

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

If you look at the cron entries I have mentioned earlier (#9296#note-11), you will see that I have the interval (-i) set to 300. I am still seeing missing entries in the alias tables on occasion, which do get corrected when cron kills and restarts filterdns within the next 15 mins.

#16 Updated by Mark Monaghan 4 months ago

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes, eventually causing the firewall to trigger CARP/HA, give high latency to VPN and internet traffic, and eventually cause the firewall to stop passing traffic altogether. I went for installing the cron GUI package (But you could just as easily edit /etc/crontab directly), and I've changed the lines to:

*/15 * * * *  root    /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1
@reboot       root    sleep 10; /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1

I've checked that these cron jobs now correctly kill the old processes before starting new ones, and that the filterdns process is correctly doing it's job as long as the restarts are in place (As noted in other notes to this bug, it still stops resolving after a while, although I've not had time to monitor the firewalls to find out exactly how long it is before the process stops, so apologies for that).

#17 Updated by Gavin Stewart 4 months ago

Mark Monaghan wrote:

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes,

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

#18 Updated by Mark Monaghan 3 months ago

Gavin Stewart wrote:

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

No, sorry, I copied and pasted the commands verbatim from here to ensure that I didn't make any errors when implementing them. The -9 was definitely in there. What I was finding was that if I ran them individually, or even as a grouped command set, from the CLI, they worked perfectly, but they failed to kill any processes when ran via the cron. This was all done and tested on 2.4.4-p3. I cannot comment on how this performed prior to this version, as it wasn't implemented on 2.4.4-p2 or lower, only after the system was upgraded to run on the latest stable version.

This is the reason I switched the cron job to pkill as nothing I tried would get killall or even kill to terminate the filterdns process via the cron, but pkill was working for reasons only known to the OS. However, that also presented it's own challenges as unless I used the exact filter of -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1" rather than pkill filterdns or pkill -f "filterdns" it was seeing filterdns in it's own cron job and killing itself before it killed any filterdns processes, so until I put the large filter in place, I wasn't any further forward and still had filterdns processes stacking up, causing the system to fall over eventually (I put this down to the system running out of open file handles as it certainly never ran out of memory before the crashes that I experienced started to happen.

#19 Updated by Rudolf Mayerhofer 3 months ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

#20 Updated by Eduard Rozenberg 3 months ago

Rudolf Mayerhofer wrote:

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

Unfortunately this doesn't solve my problem. Same issues, regardless of the refresh value. Continues to make life difficult on a regular basis.

#21 Updated by Peter van der Kleij 3 months ago

I think I have a similar problem.
My inbound rule did not work with an FQDN in the Alias. (Whitelist for source addresses) Weird thing was that only one ip-address in the Alias (not the FQDN) did not work, restarting servers/pfsense and such did not give any result.

When I enable the 300 sec for 'Aliases Hostname Resolve interval', it WORKS, when i leave it empty, it FAILS directly.

2.4.4-RELEASE-p3, FreeBSD 11.2-RELEASE-p10

#22 Updated by Art Manion 3 months ago

Netgate SG-4860 running 2.4.4-RELEASE-p3 (amd64). At least twice I've experienced issues, I assume involving filterdns, where aliases were updated via the GUI but did not make it into the pf table (and I believe the updates didn't land in /var/etc/dnsfilter.conf either, but I can't confirm).

The alias/table is 58 entries, mostly DNS names, some IP addresses, and some other pfSense aliases. The last time this problem happened neither DNS name, IP address, nor pfSense aliases were added to the pf table. filterdns was running. killall -9 filterdns && /etc/rc.filter_configure fixed it.

#23 Updated by Justin J 2 months ago

Also experiencing this issue on 2.4.4-p2 and now 2.4.4-p3. If FQDNs are remove the table updates correctly. Due to the way many cloud service provider clusters move and reallocate IPs it is impractical to use individual host or CIDR lists as this requires constant updating once a change is made on the remote server. In my case I first noted the issue after one entry was updated and became a duplicate IP in a CIDR that was in the table. I've tried killing and restarting filterdns and changing resolution time to 30 seconds but with no improvement. Sometimes the table would partially generate, other times it was completely empty.
Can this be escalated for a resolution before 2.5 as it breaks the core firewall functionality for those of us using a cloud or hybrid cloud setup?

#24 Updated by Robert Gijsen 2 months ago

I second Justins message / question. pfSense is completely unusable after 2.4.4 initial release. With filterdns not working properly, it fails as a firewall completely. That leaves us with the choice of updating pfSense to a completely useless not working version, or leave it at 2.4.4 and leave it vulnerable to now known security issues. Both are unacceptable.

This should certainly get higher priority.

#25 Updated by Tom Hebert 2 months ago

Most of you are more experienced at this than me so please be tolerant if this is a dumb question.

I added a Firewall Alias containing a single FQDN, xxx.mydomain.com. A Diagnostics=>DNS Lookup for xxx.mydomain.com returns three addresses, one IPv4 and two IPv6 (received from DHCPv6 and RA). However when I inspect the alias's table using Diagnostics=>Tables, only the IPv4 address is listed. It follows that the rule referencing the alias isn't passing traffic to the IPv6 addresses. I would expect to see all three addresses in the table. Is that expectation correct or am I missing something?

For background, I am using resolver and there is a domain override in place for mydomain.com (TypeTransparent).

Am I experiencing this bug?

#26 Updated by Justin J 2 months ago

That sounds like it might be something else Tom. Check your output from the CLI with: pfctl -T show -t ALIASNAME
If it's not there try making a forum post as the discussion here should be directly about the bug, not diagnosing the possibility of your setup having it.

#27 Updated by Tom Hebert 2 months ago

Justin J: I took your advice and posted on the forum and was promptly referred back here. Here's the link in case you are inclined to read it. https://forum.netgate.com/topic/145553/firewall-alias-not-updating-table-correctly

#28 Updated by Jim Pingle about 2 months ago

  • Category changed from Rules / NAT to Aliases / Tables

#29 Updated by Robert Gijsen 27 days ago

It's been about 8 months now that we are unable to update / patch our firewalls because of this. Yeah I know, open source, contribute yourself if you don't like it, and so on. But still no update on this from pfSense team? By now we are seriously considering moving away. It's unacceptable that we have to run a firewall that's by now simply not secure anymore because of the now public security flaws.

Can we get an official status update on this? And why this doesn't get higher priority? Is there an official ETA for 2.5?

#30 Updated by John K 11 days ago

Robert Gijsen wrote:

It's been about 8 months now that we are unable to update / patch our firewalls because of this. [...] And why this doesn't get higher priority? Is there an official ETA for 2.5?

This issue is becoming a show stopper for us as well.

#31 Updated by Angel Briceño 7 days ago

Ph. T wrote:

If you are using FQDN-Aliases each FQDN can only be used once, if
you use the alias twice, the generated tables are incomplete.

No DNS-Server/Resolver on the firewall is used. External DNS
resolvers are configured.

Example:
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4, fqdn2, fqdn3

Generated tables are incomplete
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4 (the others are missing)

alias2 does only contain fqdn4 and fqdn2 and fqdn2 are missing.

This bug seems to arise with 2.4.4_p1 and is still existing in 2.4.4_p2;
I am not sure if this behavior is present within 2.4.4.

I am working on a minimal example which i will provide.

I had the same problem. The rules stack has a limit and this means that domain names cannot be resolved. For example:

Alias-1 => Network => "10.10.0.0/24" It is not equal to "10.10.0.0-10.10.0.254"

While 10.10.0.0/24 is a valid nomenclature for a rule, but 10.10.0.0-10.10.0.254 say that the entire range should be described:
10.10.0.0
10.10.0.1
10.10.0.2
10.10.0.3
10.10.0.4
..
...
....
10.10.0.254

This range of IPs causes a problem for the "next" aliases in the system, and it is very possible that they cannot be resolved.

I have removed all gigantic ranges of IPs and the problem is solved.

This problem has already affected several cloud providers, so they mostly do not accept using FQDN aliases in their routing rules or ACLs.

#32 Updated by Gavin Stewart 7 days ago

Angel Briceño wrote:

I have removed all gigantic ranges of IPs and the problem is solved.

I have no ranges of IP addresses (only networks defined in CIDR notation), and the problem persists.

#33 Updated by Ph. T 7 days ago

I am very,very unhappy with the time it takes to deal and fix this problem.
Is there any way to speed up the process ? I can provide any additional info.

I think Angel has a complete different problem. The limit of table entries
has been an issue due to big bogon tables some time ago. A fix might be to
increase the maximum table entries, or not to use the bogon table.

    System > Advanced >Firewall & NAT
    Firewall Maximum Table Entries 

we have set this value to 400000.

#34 Updated by Luiz Souza 6 days ago

Ph. T wrote:

I am very,very unhappy with the time it takes to deal and fix this problem.
Is there any way to speed up the process ? I can provide any additional info.

I think Angel has a complete different problem. The limit of table entries
has been an issue due to big bogon tables some time ago. A fix might be to
increase the maximum table entries, or not to use the bogon table.
[...]
we have set this value to 400000.

Please, provide the filterdns logs for your case, with debug enabled (-d20).

As strange as it may seem, this is proving to be difficult to reproduce reliably.

If you want to send the logs privately, please send it to luiz at netgate.com

Thanks.

#35 Updated by Jim Pingle 6 days ago

If anyone can come up with simple cases that reliably reproduce the problem, that would definitely help. That is, the smallest possible configuration that results in the problem happening. For example, an alias and firewall rule which exhibit the problem (or multiple aliases+rules) along with whatever other conditions are necessary, such as waiting specific amounts of time, or having invalid hostnames in the alias, etc. Along with log data mentioned above and the contents of /var/etc/filterdns.conf.

#36 Updated by Ph. T 6 days ago

I will provide the data / config.xml . I could also provide a virtual-box pfsense-installation
which shows this problem. I hope i could provide this today.

#37 Updated by Ph. T 6 days ago

I have tried to reproduce the issue. Unfortently that was not possible. Now i just get complete empty tables.
I have waited the timeout.

I have used a 2.4.4_p3 Image as base. I turned of DNS forwarder and Resolver;Using an external resolver.

Steps to reproduce:

Start the machine:

Delete entry using_one_alias
Delete entry using_one_alias2

Add entry
using_one_alias host, fqdn_alias2
using_one_alias2 host, fqdn_alias1

Without killing the filterdns-process thouse tables remain empty.

[2.4.4-RELEASE][]/var/log: cat /var/etc/filterdns.conf
pf dns.google fqdn_alias1
pf one.one.one.one fqdn_alias1
pf one.one.one.one fqdn_alias2
pf one.one.one.one using_one_alias
pf dns.google using_one_alias2
pf one.one.one.one using_one_alias2

If you do a reboot the tables are populated.

But if you delete the aliases after a reboot and recreate them the tables are not filled until you restart filterdns.

#38 Updated by Ph. T 6 days ago

I see similar effects with the old config which i attached in January.

Also available in: Atom PDF