Project

General

Profile

Bug #9296

Rule / Alias FQDN-Resolution broken

Added by Ph. T 11 months ago. Updated 9 days ago.

Status:
Feedback
Priority:
High
Assignee:
Category:
Aliases / Tables
Target version:
Start date:
01/30/2019
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.4.4_2
Affected Architecture:
All

Description

If you are using FQDN-Aliases each FQDN can only be used once, if
you use the alias twice, the generated tables are incomplete.

No DNS-Server/Resolver on the firewall is used. External DNS
resolvers are configured.

Example:
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4, fqdn2, fqdn3

Generated tables are incomplete
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4 (the others are missing)

alias2 does only contain fqdn4 and fqdn2 and fqdn2 are missing.

This bug seems to arise with 2.4.4_p1 and is still existing in 2.4.4_p2;
I am not sure if this behavior is present within 2.4.4.

I am working on a minimal example which i will provide.

Rule_Set.PNG (37.4 KB) Rule_Set.PNG Ruleset Ph. T, 01/31/2019 05:29 AM
Alias_Configuration.PNG (24.2 KB) Alias_Configuration.PNG Alias-Configuration Ph. T, 01/31/2019 05:29 AM
table_fqdn1.PNG (35.8 KB) table_fqdn1.PNG table fqdn1 Ph. T, 01/31/2019 05:29 AM
table_fqdn2.PNG (49.6 KB) table_fqdn2.PNG table fqdn2 Ph. T, 01/31/2019 05:29 AM
191011_Tnk_config-pfSense.localdomain-20191011143458.xml (15.9 KB) 191011_Tnk_config-pfSense.localdomain-20191011143458.xml Ph. T, 10/11/2019 09:40 AM
pfsense.png (39 KB) pfsense.png Art Manion, 10/31/2019 11:48 PM

History

#1 Updated by Ph. T 11 months ago

I have now prepared a minimal example:

As you can see fqdn1 is missing the entry for one.one.one.one

Please FIX

#2 Updated by Eduard Rozenberg 11 months ago

I believe my issues may be related to this. We updated to 2.4.4 p2 on Jan 9, but only in the past few days have seen the problems.

The firewalls and sites at several locations are configured to allow remote access based on firewall aliases and rules using those aliases. The aliases sometimes contain a mix of IP addresses (/32), IP networks (/29 for ex), and DNS names (something.company.com).

Since the past few days, the pfSense firewalls at the various sites all reject my remote connection attempts, and the rejections are visible in the firewall logs.

It's...a problem.

#3 Updated by Jim Pingle 11 months ago

  • Category set to Rules / NAT
  • Assignee set to Luiz Souza
  • Target version set to 48
  • Affected Architecture set to All

#4 Updated by Robert Gijsen 10 months ago

2.4.4-RELEASE-p2, I've had this multiple times. At the moment I can even sort of reproduce it.
When adding hosts to an alias my AD DNS server logs:

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine result gets send back to pfSense. However the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well.

pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one to the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <existing host, which was already in the alias>

The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.

I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work. Then I added www.tweakers.net as another try, and that one gets in there immediately.
I again killed filterdns, restarted it and poof - the tables immediately got filled as they should. So it seems filterdns is partially functional - some hosts get added, some aren't. It could indeed be when hosts already exist in the table somewhere; however restarting filterdns at least populates them for a while.

Tell me what loggings you need. As it seems I can now reproduce this at will (also on my second carp / HA node by the way) I can probably give all needed logs.

#5 Updated by Robert Gijsen 10 months ago

I've just downgraded a test-machine to 2.4.4 release, and that works fine. Keeping it there for a while.

#6 Updated by Eduard Rozenberg 10 months ago

Shortly after I posted my problem above 20 days ago, it started working again on its own.

Then today, it is again not working.

So it may be a sporadic issue with the alias resolution, that doesn't happen consistently. Have not been able to pin down the issue at all.

#7 Updated by Eduard Rozenberg 10 months ago

I can confirm my issue is the same as described by the other posters on this bug.

Logs show that filterdns claims to be doing the right thing - all expected alias entries (FQDN's, IP's, networks) show up in:
$ clog /var/log/resolver.log | grep "Adding Action"

But the alias table is incomplete, some IP addresses are missing:
$ pfctl -T show -t my_alias_name

There is no DNS resolution issue with any of the FQDN's - if I ping the FQDN's from the firewall their IP addresses are resolved.

Restarting the filter, re-saving the alias does not help.

#8 Updated by Eduard Rozenberg 10 months ago

I've also ruled out some other possibilities below -

Not the issue:
https://docs.netgate.com/pfsense/en/latest/firewall/thread-error-using-many-hostname-in-aliases.html
(I don't have a threads error in logs, and setting this tunable did not help)

Not the issue:
Mixing FQDN's and IP's - I tried creating a new alias with only a single FQDN from the ones that don't work in the original alias. Still no luck.

#9 Updated by Jim Pingle 9 months ago

  • Target version changed from 48 to 2.5.0

#10 Updated by Azamat Khakimyanov 7 months ago

I see this behavior on 2.4.4_p2, on 2.4.5-dev and on 2.5.0-dev.
As workaround we can:
- in console run 'pkill filterdns' command
- then /Status/Filter Reload to start 'filterdns' service

#11 Updated by Gavin Stewart 7 months ago

As a workaround I have installed the Cron package with the following additional entries:

*/15 * * * * root killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1     
@reboot      root sleep 10; killall -9 filterdns; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1

#12 Updated by Robert Gijsen 7 months ago

I know it's targeted for 2.5.0, but still I'd like to inform people here that 2.4.4_3 does indeed NOT fix this, making it yet another update that kills main functionality. Be aware, and be extremely reluctant to update!

#13 Updated by Rudolf Mayerhofer 6 months ago

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

#14 Updated by Christoforos Tsoukaris 6 months ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

#15 Updated by Gavin Stewart 6 months ago

Christoforos Tsoukaris wrote:

Based on what Rudolf wrote, I changed the value of "Aliases Hostnames Resolve Interval" from empty to 300secs. That setting combined with a filter reload from "Status/Filter Reload" menu made my rules work as expected again.

I will continue testing and update this post if I find anything else.

If you look at the cron entries I have mentioned earlier (#9296#note-11), you will see that I have the interval (-i) set to 300. I am still seeing missing entries in the alias tables on occasion, which do get corrected when cron kills and restarts filterdns within the next 15 mins.

#16 Updated by Mark Monaghan 6 months ago

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes, eventually causing the firewall to trigger CARP/HA, give high latency to VPN and internet traffic, and eventually cause the firewall to stop passing traffic altogether. I went for installing the cron GUI package (But you could just as easily edit /etc/crontab directly), and I've changed the lines to:

*/15 * * * *  root    /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1
@reboot       root    sleep 10; /usr/bin/pkill -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1"; sleep 2; /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1

I've checked that these cron jobs now correctly kill the old processes before starting new ones, and that the filterdns process is correctly doing it's job as long as the restarts are in place (As noted in other notes to this bug, it still stops resolving after a while, although I've not had time to monitor the firewalls to find out exactly how long it is before the process stops, so apologies for that).

#17 Updated by Gavin Stewart 6 months ago

Mark Monaghan wrote:

The crontab entries as mentioned in #11 didn't run as they were just keeping on adding new filterdns processes,

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

#18 Updated by Mark Monaghan 5 months ago

Gavin Stewart wrote:

Very interesting. I'm not seeing that occur at all (and it was something I was monitoring closely when I set it up). I wonder what makes this operation different on your system ? Could you have possibly forgotten the "-9" argument to "killall" ?

No, sorry, I copied and pasted the commands verbatim from here to ensure that I didn't make any errors when implementing them. The -9 was definitely in there. What I was finding was that if I ran them individually, or even as a grouped command set, from the CLI, they worked perfectly, but they failed to kill any processes when ran via the cron. This was all done and tested on 2.4.4-p3. I cannot comment on how this performed prior to this version, as it wasn't implemented on 2.4.4-p2 or lower, only after the system was upgraded to run on the latest stable version.

This is the reason I switched the cron job to pkill as nothing I tried would get killall or even kill to terminate the filterdns process via the cron, but pkill was working for reasons only known to the OS. However, that also presented it's own challenges as unless I used the exact filter of -f "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1" rather than pkill filterdns or pkill -f "filterdns" it was seeing filterdns in it's own cron job and killing itself before it killed any filterdns processes, so until I put the large filter in place, I wasn't any further forward and still had filterdns processes stacking up, causing the system to fall over eventually (I put this down to the system running out of open file handles as it certainly never ran out of memory before the crashes that I experienced started to happen.

#19 Updated by Rudolf Mayerhofer 5 months ago

Rudolf Mayerhofer wrote:

Setting "Aliases Hostnames Resolve Interval" to 30 seconds (which should be the minimum value) in System/Advanced/Firewall&NAT seems to work around the issue for me (which could be some kind of race condition in filterdns but that's just guessing on my side).

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

#20 Updated by Eduard Rozenberg 5 months ago

Rudolf Mayerhofer wrote:

As a follow up: With 30 seconds resolve interval things are still working fine one month later without killing/restarting filterdns.

Unfortunately this doesn't solve my problem. Same issues, regardless of the refresh value. Continues to make life difficult on a regular basis.

#21 Updated by Peter van der Kleij 5 months ago

I think I have a similar problem.
My inbound rule did not work with an FQDN in the Alias. (Whitelist for source addresses) Weird thing was that only one ip-address in the Alias (not the FQDN) did not work, restarting servers/pfsense and such did not give any result.

When I enable the 300 sec for 'Aliases Hostname Resolve interval', it WORKS, when i leave it empty, it FAILS directly.

2.4.4-RELEASE-p3, FreeBSD 11.2-RELEASE-p10

#22 Updated by Art Manion 4 months ago

Netgate SG-4860 running 2.4.4-RELEASE-p3 (amd64). At least twice I've experienced issues, I assume involving filterdns, where aliases were updated via the GUI but did not make it into the pf table (and I believe the updates didn't land in /var/etc/dnsfilter.conf either, but I can't confirm).

The alias/table is 58 entries, mostly DNS names, some IP addresses, and some other pfSense aliases. The last time this problem happened neither DNS name, IP address, nor pfSense aliases were added to the pf table. filterdns was running. killall -9 filterdns && /etc/rc.filter_configure fixed it.

#23 Updated by Justin J 4 months ago

Also experiencing this issue on 2.4.4-p2 and now 2.4.4-p3. If FQDNs are remove the table updates correctly. Due to the way many cloud service provider clusters move and reallocate IPs it is impractical to use individual host or CIDR lists as this requires constant updating once a change is made on the remote server. In my case I first noted the issue after one entry was updated and became a duplicate IP in a CIDR that was in the table. I've tried killing and restarting filterdns and changing resolution time to 30 seconds but with no improvement. Sometimes the table would partially generate, other times it was completely empty.
Can this be escalated for a resolution before 2.5 as it breaks the core firewall functionality for those of us using a cloud or hybrid cloud setup?

#24 Updated by Robert Gijsen 4 months ago

I second Justins message / question. pfSense is completely unusable after 2.4.4 initial release. With filterdns not working properly, it fails as a firewall completely. That leaves us with the choice of updating pfSense to a completely useless not working version, or leave it at 2.4.4 and leave it vulnerable to now known security issues. Both are unacceptable.

This should certainly get higher priority.

#25 Updated by Tom Hebert 4 months ago

Most of you are more experienced at this than me so please be tolerant if this is a dumb question.

I added a Firewall Alias containing a single FQDN, xxx.mydomain.com. A Diagnostics=>DNS Lookup for xxx.mydomain.com returns three addresses, one IPv4 and two IPv6 (received from DHCPv6 and RA). However when I inspect the alias's table using Diagnostics=>Tables, only the IPv4 address is listed. It follows that the rule referencing the alias isn't passing traffic to the IPv6 addresses. I would expect to see all three addresses in the table. Is that expectation correct or am I missing something?

For background, I am using resolver and there is a domain override in place for mydomain.com (TypeTransparent).

Am I experiencing this bug?

#26 Updated by Justin J 4 months ago

That sounds like it might be something else Tom. Check your output from the CLI with: pfctl -T show -t ALIASNAME
If it's not there try making a forum post as the discussion here should be directly about the bug, not diagnosing the possibility of your setup having it.

#27 Updated by Tom Hebert 4 months ago

Justin J: I took your advice and posted on the forum and was promptly referred back here. Here's the link in case you are inclined to read it. https://forum.netgate.com/topic/145553/firewall-alias-not-updating-table-correctly

#28 Updated by Jim Pingle 4 months ago

  • Category changed from Rules / NAT to Aliases / Tables

#29 Updated by Robert Gijsen 3 months ago

It's been about 8 months now that we are unable to update / patch our firewalls because of this. Yeah I know, open source, contribute yourself if you don't like it, and so on. But still no update on this from pfSense team? By now we are seriously considering moving away. It's unacceptable that we have to run a firewall that's by now simply not secure anymore because of the now public security flaws.

Can we get an official status update on this? And why this doesn't get higher priority? Is there an official ETA for 2.5?

#30 Updated by John K 2 months ago

Robert Gijsen wrote:

It's been about 8 months now that we are unable to update / patch our firewalls because of this. [...] And why this doesn't get higher priority? Is there an official ETA for 2.5?

This issue is becoming a show stopper for us as well.

#31 Updated by Angel Briceño 2 months ago

Ph. T wrote:

If you are using FQDN-Aliases each FQDN can only be used once, if
you use the alias twice, the generated tables are incomplete.

No DNS-Server/Resolver on the firewall is used. External DNS
resolvers are configured.

Example:
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4, fqdn2, fqdn3

Generated tables are incomplete
alias1 : fqdn1, fqdn2, fqdn3
alias2 : fqdn4 (the others are missing)

alias2 does only contain fqdn4 and fqdn2 and fqdn2 are missing.

This bug seems to arise with 2.4.4_p1 and is still existing in 2.4.4_p2;
I am not sure if this behavior is present within 2.4.4.

I am working on a minimal example which i will provide.

I had the same problem. The rules stack has a limit and this means that domain names cannot be resolved. For example:

Alias-1 => Network => "10.10.0.0/24" It is not equal to "10.10.0.0-10.10.0.254"

While 10.10.0.0/24 is a valid nomenclature for a rule, but 10.10.0.0-10.10.0.254 say that the entire range should be described:
10.10.0.0
10.10.0.1
10.10.0.2
10.10.0.3
10.10.0.4
..
...
....
10.10.0.254

This range of IPs causes a problem for the "next" aliases in the system, and it is very possible that they cannot be resolved.

I have removed all gigantic ranges of IPs and the problem is solved.

This problem has already affected several cloud providers, so they mostly do not accept using FQDN aliases in their routing rules or ACLs.

#32 Updated by Gavin Stewart 2 months ago

Angel Briceño wrote:

I have removed all gigantic ranges of IPs and the problem is solved.

I have no ranges of IP addresses (only networks defined in CIDR notation), and the problem persists.

#33 Updated by Ph. T 2 months ago

I am very,very unhappy with the time it takes to deal and fix this problem.
Is there any way to speed up the process ? I can provide any additional info.

I think Angel has a complete different problem. The limit of table entries
has been an issue due to big bogon tables some time ago. A fix might be to
increase the maximum table entries, or not to use the bogon table.

    System > Advanced >Firewall & NAT
    Firewall Maximum Table Entries 

we have set this value to 400000.

#34 Updated by Luiz Souza 2 months ago

Ph. T wrote:

I am very,very unhappy with the time it takes to deal and fix this problem.
Is there any way to speed up the process ? I can provide any additional info.

I think Angel has a complete different problem. The limit of table entries
has been an issue due to big bogon tables some time ago. A fix might be to
increase the maximum table entries, or not to use the bogon table.
[...]
we have set this value to 400000.

Please, provide the filterdns logs for your case, with debug enabled (-d20).

As strange as it may seem, this is proving to be difficult to reproduce reliably.

If you want to send the logs privately, please send it to luiz at netgate.com

Thanks.

#35 Updated by Jim Pingle 2 months ago

If anyone can come up with simple cases that reliably reproduce the problem, that would definitely help. That is, the smallest possible configuration that results in the problem happening. For example, an alias and firewall rule which exhibit the problem (or multiple aliases+rules) along with whatever other conditions are necessary, such as waiting specific amounts of time, or having invalid hostnames in the alias, etc. Along with log data mentioned above and the contents of /var/etc/filterdns.conf.

#36 Updated by Ph. T 2 months ago

I will provide the data / config.xml . I could also provide a virtual-box pfsense-installation
which shows this problem. I hope i could provide this today.

#37 Updated by Ph. T 2 months ago

I have tried to reproduce the issue. Unfortently that was not possible. Now i just get complete empty tables.
I have waited the timeout.

I have used a 2.4.4_p3 Image as base. I turned of DNS forwarder and Resolver;Using an external resolver.

Steps to reproduce:

Start the machine:

Delete entry using_one_alias
Delete entry using_one_alias2

Add entry
using_one_alias host, fqdn_alias2
using_one_alias2 host, fqdn_alias1

Without killing the filterdns-process thouse tables remain empty.

[2.4.4-RELEASE][]/var/log: cat /var/etc/filterdns.conf
pf dns.google fqdn_alias1
pf one.one.one.one fqdn_alias1
pf one.one.one.one fqdn_alias2
pf one.one.one.one using_one_alias
pf dns.google using_one_alias2
pf one.one.one.one using_one_alias2

If you do a reboot the tables are populated.

But if you delete the aliases after a reboot and recreate them the tables are not filled until you restart filterdns.

#38 Updated by Ph. T 2 months ago

I see similar effects with the old config which i attached in January.

#39 Updated by John K about 2 months ago

Jim Pingle wrote:

If anyone can come up with simple cases that reliably reproduce the problem [...]

What's the status here? Has Netgate been able to reproduce this issue?

#40 Updated by Jim Pingle about 2 months ago

John K wrote:

What's the status here? Has Netgate been able to reproduce this issue?

Not that I have seen yet. We still need to find a combination of settings that reliably and repeatedly reproduces the issue.

#41 Updated by Vinicius DellAglio about 2 months ago

Jim Pingle wrote:

John K wrote:

What's the status here? Has Netgate been able to reproduce this issue?

Not that I have seen yet. We still need to find a combination of settings that reliably and repeatedly reproduces the issue.

I just installed a brand new pfsense box and once I created an alias with an FQDN it didn't work, when I checked it on TABLES I only had the entries' list above the fqdn entry, the fqdn itself and everything else were missing.

Once I deleted the FQDN entry and hit filter reload, the alias was correct.

#42 Updated by John K about 1 month ago

Vinicius DellAglio wrote:

I just installed a brand new pfsense box and once I created an alias with an FQDN it didn't work, when I checked it on TABLES I only had the entries' list above the fqdn entry, the fqdn itself and everything else were missing.

Once I deleted the FQDN entry and hit filter reload, the alias was correct.

Is the FQDN a CNAME, A record, etc?

#43 Updated by Art Manion about 1 month ago

Art Manion wrote:

Netgate SG-4860 running 2.4.4-RELEASE-p3 (amd64). At least twice I've experienced issues, I assume involving filterdns, where aliases were updated via the GUI but did not make it into the pf table (and I believe the updates didn't land in /var/etc/dnsfilter.conf either, but I can't confirm).

The alias/table is 58 entries, mostly DNS names, some IP addresses, and some other pfSense aliases. The last time this problem happened neither DNS name, IP address, nor pfSense aliases were added to the pf table. filterdns was running. killall -9 filterdns && /etc/rc.filter_configure fixed it.

Update:

In the GUI, there is an alias named yoke11 that contains the IP address 192.168.1.211. There is another alias, internet_allowed_LAN, that contains yoke11. Weeks after this configuration was made (including multiple save/apply actions from the GUI), 192.168.1.211 is not the internet_allowed_LAN pf table and is in /var/etc/filterdns.conf. This vaguely points to a problem between filterdns.conf and the actual pf table.

I tried once to reproduce this behavior with clean/new/test aliases and I did not observe the problem.

pfctl -t internet_allowed_LAN -T test 192.168.1.211
0/1 addresses match.

grep '192.168.1.211' /var/etc/filterdns.conf
pf 192.168.1.211 internet_allowed_LAN

killall -9 filterdns
/etc/rc.filter_configure

pfctl -t internet_allowed_LAN -T test 192.168.1.211
1/1 addresses match.

I also observe that aliases do not become pf tables until the alias is used in a firewall rule. I believe this is expected behavior, just noting it.

#44 Updated by Gavin Stewart about 1 month ago

Jim Pingle wrote:

John K wrote:

What's the status here? Has Netgate been able to reproduce this issue?

Not that I have seen yet. We still need to find a combination of settings that reliably and repeatedly reproduces the issue.

I now have a minimal and repeatable set of steps to reproduce this.

This has been verified in a VirtualBox VM with the following configuration:
  • New VM for FreeBSD 64-bit, accepting all defaults
  • First network adapter stays NAT (used as WAN in pfSense)
  • Add second network adapter as host-only (used as LAN in pfSense) to access web interface from host
  • pfSense-CE-2.4.4-RELEASE-p3-amd64.iso.gz, accepting all defaults
  • configure LAN IP address as needed at pfSense console to match host-only network settings
Procedure to reproduce FQDN name resolution failure in Alias tables:
  1. Firewall -> Aliases -> Import
  2. Alias Name "TEST1", import entire alias list below, Save
  3. Diagnostics -> Tables -> "TEST1"
    + Note approx 59 entries, specifically the 192.168.x.x addresses.
    + (It may take a couple of page reloads for all the addresses to resolve.)
  4. Firewall -> Aliases -> Import
  5. Alias Name "TEST2", import entire alias list below, Save
  6. Diagnostics -> Tables -> "TEST2"
    + Note that table is never fully populated (even if waiting longer than the 5 minute filterdns interval).
    + Note that 192.168.x.x addresses do not all appear.
  7. Kill the filterfdns process at the console, and restart manually:
    /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
    + Note that "TEST2" table is now properly populated.

List of aliases to import:

mail.bigpond.com
speedtest.telstra.com
mirror.internode.on.net
192.168.1.1
mirror.aarnet.edu.au
account.mojang.com
authserver.mojang.com
192.168.2.2
mc.hypixel.net
sessionserver.mojang.com
launchermeta.mojang.com
192.168.3.3
apps.yourtown.com.au
obook4.oxforddigital.com.au
www.oxforddigital.com.au
192.168.4.4
www.distance.vic.edu.au
lms.decvonline.vic.edu.au
connect.vic.edu.au
192.168.5.5
us.mineplex.com
stileapp.com
stileeducation.com
192.168.6.6

#45 Updated by John K 25 days ago

Jim Pingle wrote:

John K wrote:

What's the status here? Has Netgate been able to reproduce this issue?

Not that I have seen yet. We still need to find a combination of settings that reliably and repeatedly reproduces the issue.

Given Gavin Stewart's steps above, has Netgate finally been able to reproduce this issue now?

#46 Updated by Gavin Stewart 18 days ago

Gavin Stewart wrote:

I now have a minimal and repeatable set of steps to reproduce this.

Actually, I have revised the list of aliases to just one single host, still repeatable:

mail.bigpond.com

As per my previous instructions, with a clean install, and adding two host aliases for mail.bigpond.com (TEST1 and TEST2, applying changes between them), this is the entire clog output from /var/log/resolver.log (filterdns with -d 20):

Nov 27 06:18:13 pfSense filterdns:  Adding Action: pf table: TEST1 host: mail.bigpond.com
Nov 27 06:18:13 pfSense filterdns:      Adding host mail.bigpond.com
Nov 27 06:18:13 pfSense filterdns: Creating a new thread for action type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:18:13 pfSense filterdns: Creating a new thread for host mail.bigpond.com
Nov 27 06:18:20 pfSense filterdns:      found address 203.36.137.241 for host mail.bigpond.com
Nov 27 06:18:20 pfSense filterdns:          adding address 203.36.137.241 for host mail.bigpond.com
Nov 27 06:18:20 pfSense filterdns: Change detected on host: mail.bigpond.com
Nov 27 06:18:20 pfSense filterdns:  Awaking from the sleep for type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:18:20 pfSense filterdns:      Added pf address, table: TEST1 host: mail.bigpond.com address: 203.36.137.241
Nov 27 06:18:20 pfSense filterdns:  Updated pf table TEST1 host: mail.bigpond.com error: 0
Nov 27 06:18:24 pfSense filterdns: Received signal Hangup(1).
Nov 27 06:18:24 pfSense filterdns: merge_config: configuration reload
Nov 27 06:18:24 pfSense filterdns: Copied 1 actions to old
Nov 27 06:18:24 pfSense filterdns:  Adding Action: pf table: TEST1 host: mail.bigpond.com
Nov 27 06:18:24 pfSense filterdns:  Adding Action: pf table: TEST2 host: mail.bigpond.com
Nov 27 06:18:24 pfSense filterdns: Copied 2 actions to new
Nov 27 06:18:24 pfSense filterdns: Cleaning up action type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:18:24 pfSense filterdns: Loaded actions: 1 old and 1 new = 2 total
Nov 27 06:18:24 pfSense filterdns: Cleaning up previous actions
Nov 27 06:18:24 pfSense filterdns: Creating a new thread for action type: pf table: TEST2 hostname: mail.bigpond.com
Nov 27 06:18:24 pfSense filterdns:  Awaking from the sleep for hostname mail.bigpond.com (2)
Nov 27 06:18:24 pfSense filterdns:      found address 203.36.137.241 for host mail.bigpond.com
Nov 27 06:23:23 pfSense filterdns:  Awaking from the sleep for hostname mail.bigpond.com (2)
Nov 27 06:23:24 pfSense filterdns:      found address 203.36.137.241 for host mail.bigpond.com

[2.4.4-RELEASE][root@pfSense.localdomain]/root: cat /var/etc/filterdns.conf
pf mail.bigpond.com TEST1
pf mail.bigpond.com TEST2
[2.4.4-RELEASE][root@pfSense.localdomain]/root: pfctl -T show -t TEST1
   203.36.137.241
[2.4.4-RELEASE][root@pfSense.localdomain]/root: pfctl -T show -t TEST2
[2.4.4-RELEASE][root@pfSense.localdomain]/root: 

As can be seen, filterdns doesn't ever add the resolved address to the second table, unless filterdns is killed and restarted, resulting in this addition to the log:

Nov 27 06:26:10 pfSense filterdns: Received signal Terminated(15).
Nov 27 06:26:10 pfSense filterdns: Cleaning up action type: pf table: TEST2 hostname: mail.bigpond.com
Nov 27 06:26:10 pfSense filterdns: Cleaning up action type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:26:10 pfSense filterdns: Waiting 2 seconds for threads to finish
Nov 27 06:26:10 pfSense filterdns:  Awaking from the sleep for hostname mail.bigpond.com (0)
Nov 27 06:26:10 pfSense filterdns: Cleaning up hostname mail.bigpond.com
Nov 27 06:26:10 pfSense filterdns:          removing address 203.36.137.241 from host mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns:  Adding Action: pf table: TEST1 host: mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns:      Adding host mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns:  Adding Action: pf table: TEST2 host: mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns: Creating a new thread for action type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns: Creating a new thread for action type: pf table: TEST2 hostname: mail.bigpond.com
Nov 27 06:26:35 pfSense filterdns: Creating a new thread for host mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns:      found address 203.36.137.241 for host mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns:          adding address 203.36.137.241 for host mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns: Change detected on host: mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns:  Awaking from the sleep for type: pf table: TEST1 hostname: mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns:      Added pf address, table: TEST1 host: mail.bigpond.com address: 203.36.137.241
Nov 27 06:26:36 pfSense filterdns:  Updated pf table TEST1 host: mail.bigpond.com error: 0
Nov 27 06:26:36 pfSense filterdns:  Awaking from the sleep for type: pf table: TEST2 hostname: mail.bigpond.com
Nov 27 06:26:36 pfSense filterdns:      Added pf address, table: TEST2 host: mail.bigpond.com address: 203.36.137.241
Nov 27 06:26:36 pfSense filterdns:  Updated pf table TEST2 host: mail.bigpond.com error: 0
[2.4.4-RELEASE][root@pfSense.localdomain]/root: cat /var/etc/filterdns.conf
pf mail.bigpond.com TEST1
pf mail.bigpond.com TEST2
[2.4.4-RELEASE][root@pfSense.localdomain]/root: pfctl -T show -t TEST1
   203.36.137.241
[2.4.4-RELEASE][root@pfSense.localdomain]/root: pfctl -T show -t TEST2
   203.36.137.241
[2.4.4-RELEASE][root@pfSense.localdomain]/root: 

#47 Updated by Gavin Stewart 17 days ago

I have a fix for this, and have created a pull request.

https://github.com/pfsense/FreeBSD-ports/pull/714

#48 Updated by Jim Pingle 17 days ago

  • Status changed from New to Pull Request Review

#49 Updated by Luiz Souza 16 days ago

  • Status changed from Pull Request Review to Feedback
  • % Done changed from 0 to 100

A fix based on Gavin's PR was committed, please let me know if the problem persists.

Thanks

#50 Updated by Robert Gijsen 12 days ago

Luiz Souza wrote:

A fix based on Gavin's PR was committed, please let me know if the problem persists.

Thanks

Maybe a stupic question, but as I don't have any git or build tools available within pfSense obviously, how can we test this? Does that mean we'd need to install the 2.5 nightly? Sidequestion, is there any ETA on 2.5 RTM?

#51 Updated by Christian Ullrich 12 days ago

  • Robert Gijsen wrote:

Maybe a stupic question, but as I don't have any git or build tools available within pfSense obviously, how can we test this? Does that mean we'd need to install the 2.5 nightly? Sidequestion, is there any ETA on 2.5 RTM?

In a (very large) nutshell:

$ git clone https://github.com/pfsense/freebsd-ports pfsense-ports
$ git clone -b RELENG_2_4_4 https://github.com/pfsense/freebsd-src pfsense-src
# poudriere ports -cp pfsense -m null -M $PWD/pfsense-ports
# poudriere jail -cj pfsense244 -m src=$PWD/pfsense-src -b
# echo ALLOW_UNSUPPORTED_SYSTEM=1 >> /usr/local/etc/poudriere.d/pfsense-make.conf
# poudriere bulk -j pfsense244 -p pfsense -z default net/filterdns
$ scp /usr/local/poudriere/data/packages/pfsense244-pfsense-default/All/filterdns-2.0_3.txz $WHEREVER
$ ssh $WHEREVER pkg install -f filterdns-2.0_3.txz

The ALLOW_UNSUPPORTED_SYSTEM line is necessary if the next line fails (on a FreeBSD 12 build system).

#52 Updated by Christian Ullrich 12 days ago

  • Luiz Souza wrote:

A fix based on Gavin's PR was committed, please let me know if the problem persists.

Confirmed. With rebuilt filterdns-2.0_3 on pfSense 2.4.4-p3, the tables are now populated correctly.

#53 Updated by Luiz Souza 12 days ago

  • Status changed from Feedback to Resolved

#54 Updated by Jim Pingle 11 days ago

  • Target version changed from 2.5.0 to 2.4.5

#55 Updated by Robert Gijsen 11 days ago

Luiz Souza wrote:

A fix based on Gavin's PR was committed, please let me know if the problem persists.

Thanks

I have compiled the package for our test-environment (huge thanks to Christian Ullrich for the info, I couldn't have done that without his help) and so far tables are populated as it should now.

#56 Updated by Jim Pingle 9 days ago

  • Status changed from Resolved to Feedback

Needs checked and/or tested again on 2.4.5 snapshots

Also available in: Atom PDF