Bug #8758
closedfilterdns stops working on a regular basis.
Added by Pi Ba about 6 years ago. Updated over 5 years ago.
100%
Description
filterdns stops working on a regular basis.
Ive found that filterdns stops working after a short while (a few days..)
Yes ive got a few dns names that dont properly resolve in there.. but that shouldn't matter for the names that do resolve right?
The process does keep 'running', but doesn't log anymore to the dnsresolver log..
Is there any way to query it directly for what it might be doing / waiting for?
What would help find the cause of such a issue?
Files
resolver.txt (500 KB) resolver.txt | khaled osama, 10/24/2018 01:31 AM |
Updated by Jim Pingle about 6 years ago
- Priority changed from Normal to High
I had this happen on my edge firewall which is running a snapshot from earlier this week. The pf tables with entries managed by filterdns
were empty when it failed, even entries that did not require DNS resolution. There were no log messages from filterdns until it was killed and restarted, at which point it was functioning as expected.
The filterdns
process was idle and not doing anything. Running truss -fp
on its PID showed no activity.
When it was stuck:
root 91126 0.0 0.2 101592 8936 - Is 14:10 0:09.41 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1
After killing it manually and restarting:
root 6915 0.0 0.2 33880 8060 - Ss 09:47 0:00.06 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1
Checking it randomly after, the main process was always in the "sleep" state (Ss
), I didn't see the main process in the idle state (Is
) again, but if you show all threads (ps uxHaww | grep filterdns
) the individual threads are in the idle state.
Updated by → luckman212 about 6 years ago
Would this affect IP alias lists getting "stuck" and not updating? I ran into this last week - editing an Alias, added a few IPs and everything saved and looked good in the GUI, but when checking the actual pf tables they still had the old IPs. Only a reboot fixed it.
Was just wondering if this was the same issue...
Updated by Jim Pingle about 6 years ago
That is a possible side effect. You can kill the filterdns
daemon and then trigger a filter reload from Status > Filter Reload to restart it. Much faster than a reboot.
Updated by → luckman212 about 6 years ago
Hmm. In that case, how about we add some tests to the alias edit php functions to query pfctl directly after a save and if it finds that the tables are not updating, then kill and restart the daemon automatically?
Updated by Jim Pingle about 6 years ago
That's a rather ugly hack/kludge and it would be better to find out why it's failing and fix it directly.
Updated by → luckman212 about 6 years ago
Agree it's less than ideal. But until a proper fix can be found a kludge might be safer than nothing, since aliases not updating could leave large security holes open.
I'm not sure where the source code for filterdns.c even is. It looks like it used to be in pfsense-tools but that's been archived and made private. Where to even look?
Updated by Jim Pingle about 6 years ago
It is at https://github.com/pfsense/FreeBSD-ports/blob/devel/net/filterdns/files/filterdns.c but Renato is already looking at this currently so it will be fixed properly in this release.
Updated by Renato Botelho about 6 years ago
Jim Pingle wrote:
I had this happen on my edge firewall which is running a snapshot from earlier this week. The pf tables with entries managed by
filterdns
were empty when it failed, even entries that did not require DNS resolution. There were no log messages from filterdns until it was killed and restarted, at which point it was functioning as expected.The
filterdns
process was idle and not doing anything. Runningtruss -fp
on its PID showed no activity.When it was stuck:
[...]
After killing it manually and restarting:
[...]Checking it randomly after, the main process was always in the "sleep" state (
Ss
), I didn't see the main process in the idle state (Is
) again, but if you show all threads (ps uxHaww | grep filterdns
) the individual threads are in the idle state.
The I state indicates it's sleeping for over 20 seconds and per-se is not the problem because filterdns threads sleep for 1 minute so it will stay as S in the first 20 seconds and then move to I
Updated by Renato Botelho about 6 years ago
- Status changed from 13 to In Progress
Updated by Anonymous about 6 years ago
- Assignee deleted (
Renato Botelho) - Target version changed from 2.4.4 to 48
Currently unable to reproduce
Updated by → luckman212 about 6 years ago
I have definitely hit this one - yes it is hard to reproduce. But, if I hit it again, is it worth sending any sort of truss output or anything else to help track it down?
Updated by Renato Botelho about 6 years ago
Luke Hamburg wrote:
I have definitely hit this one - yes it is hard to reproduce. But, if I hit it again, is it worth sending any sort of truss output or anything else to help track it down?
Can you patch your local system the places that start filterdns and replace '-d 1' by '-d 10' to increase debug level? then if it fails it would be good to check the logs from /var/log/resolver.log
Updated by → luckman212 about 6 years ago
Ok, I've done that. For anyone else who wants an easy way, I made a patch that you can add via System Patches:
https://github.com/luckman212/pfsense/commit/72834bf677bdbd1cf78f6772b79abe4b3eaa8235
Don't forget to kill & restart filterdns after applying, or simply reboot.
To follow the resolver logs in realtime, you can use:
clog -f /var/log/resolver.log
Also, I don't see it linked here so I want to add, this seems very related to https://redmine.pfsense.org/issues/8001 which was reported on the forums here: https://forum.netgate.com/topic/124467/filterdns-stops-working
Might literally be the same issue.
Updated by khaled osama almost 6 years ago
- File resolver.txt resolver.txt added
Dear All
i am affected with same problem
it happens every day approx.
i must kill filterdns service and restart to make worked again
i increased the debug level for filterdns
and i attached the resolver log file
best regards,
i linked the same post from netgate forum
https://forum.netgate.com/topic/124467/filterdns-stops-working/36
Updated by Luiz Souza almost 6 years ago
- Status changed from In Progress to Feedback
This issue was one of fixes included in the new filterdns (version 2.0).
If you still have issues, please let us know.
Thanks!
Updated by Renato Botelho almost 6 years ago
- Target version changed from 48 to 2.4.4-p1
Updated by Luiz Souza almost 6 years ago
- Status changed from Feedback to Closed
Updated by Rudolf Mayerhofer over 5 years ago
Just ran into this on 2.4.4-p2 with a not updating alias table:
[2.4.4-RELEASE][root@fw2]/root: ps aux | grep filterdns
root 96457 0.0 0.0 49776 9336 - Is 10Jan19 0:10.67 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
root 2975 0.0 0.0 6564 2468 0 S+ 12:54 0:00.00 grep filterdns
After killall -9 filterdns and applying changes to an alias filterdns started working properly again:
[2.4.4-RELEASE][root@fw2]/root: ps aux | grep filterdns
root 36300 67.0 0.0 47984 8016 - Ss 12:54 0:00.01 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
root 39464 0.0 0.0 6564 2468 0 S+ 12:54 0:00.00 grep filterdns
Unfortunately, at this point in time I do not have any more debug output available. I will continue to monitor this and add more information as it will be available to me.
Updated by Robert Gijsen over 5 years ago
2.4.4-RELEASE-p2, I've had this multiple times. At the moment I can even sort of reproduce it.
When adding hosts to an alias my AD DNS server logs:
2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)
2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)
2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)
2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)
This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine result gets send back to pfSense. However the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well. At first I though the issue was with hosts that are already in a table somewhere, but that doesn't seem to be the case. Most internal names I tried now don't end up in that table either.
pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one in the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: www.ict-net.nl
The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.
I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work. Then I added www.tweakers.net as another try, and that one gets in there immediately.
I again killed filterdns, restarted it and poof - the tables immediately got filled as they should. So it seems filterdns is partially functional - some hosts get added, some aren't.
Tell me what loggings you need. As it seems I can now reproduce this at will (also on my second carp / HA node by the way) I can probably give all needed logs.
Updated by Jim Pingle over 5 years ago
This is a closed/resolved issue. If you have problems with filterdns, they are likely already covered by #9296 -- add your notes there.