Project

General

Profile

Bug #8758

filterdns stops working on a regular basis.

Added by Pi Ba about 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
FilterDNS
Target version:
Start date:
08/06/2018
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.4.3_1
Affected Architecture:
All

Description

filterdns stops working on a regular basis.

Ive found that filterdns stops working after a short while (a few days..)

Yes ive got a few dns names that dont properly resolve in there.. but that shouldn't matter for the names that do resolve right?
The process does keep 'running', but doesn't log anymore to the dnsresolver log..
Is there any way to query it directly for what it might be doing / waiting for?

What would help find the cause of such a issue?

resolver.txt (500 KB) resolver.txt khaled osama, 10/24/2018 01:31 AM

History

#1 Updated by Jim Pingle about 1 year ago

  • Priority changed from Normal to High

I had this happen on my edge firewall which is running a snapshot from earlier this week. The pf tables with entries managed by filterdns were empty when it failed, even entries that did not require DNS resolution. There were no log messages from filterdns until it was killed and restarted, at which point it was functioning as expected.

The filterdns process was idle and not doing anything. Running truss -fp on its PID showed no activity.

When it was stuck:

root    91126   0.0  0.2 101592  8936  -  Is   14:10      0:09.41 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1

After killing it manually and restarting:
root     6915   0.0  0.2 33880  8060  -  Ss   09:47      0:00.06 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 30 -c /var/etc/filterdns.conf -d 1

Checking it randomly after, the main process was always in the "sleep" state (Ss), I didn't see the main process in the idle state (Is) again, but if you show all threads (ps uxHaww | grep filterdns) the individual threads are in the idle state.

#2 Updated by Steve Beaver about 1 year ago

  • Assignee set to Renato Botelho

#3 Updated by Steve Beaver about 1 year ago

  • Status changed from New to This Sprint

#4 Updated by Luke Hamburg about 1 year ago

Would this affect IP alias lists getting "stuck" and not updating? I ran into this last week - editing an Alias, added a few IPs and everything saved and looked good in the GUI, but when checking the actual pf tables they still had the old IPs. Only a reboot fixed it.

Was just wondering if this was the same issue...

#5 Updated by Jim Pingle about 1 year ago

That is a possible side effect. You can kill the filterdns daemon and then trigger a filter reload from Status > Filter Reload to restart it. Much faster than a reboot.

#6 Updated by Luke Hamburg about 1 year ago

Hmm. In that case, how about we add some tests to the alias edit php functions to query pfctl directly after a save and if it finds that the tables are not updating, then kill and restart the daemon automatically?

#7 Updated by Jim Pingle about 1 year ago

That's a rather ugly hack/kludge and it would be better to find out why it's failing and fix it directly.

#8 Updated by Luke Hamburg about 1 year ago

Agree it's less than ideal. But until a proper fix can be found a kludge might be safer than nothing, since aliases not updating could leave large security holes open.

I'm not sure where the source code for filterdns.c even is. It looks like it used to be in pfsense-tools but that's been archived and made private. Where to even look?

#9 Updated by Jim Pingle about 1 year ago

It is at https://github.com/pfsense/FreeBSD-ports/blob/devel/net/filterdns/files/filterdns.c but Renato is already looking at this currently so it will be fixed properly in this release.

#10 Updated by Luke Hamburg about 1 year ago

Sounds good!

#11 Updated by Renato Botelho about 1 year ago

Jim Pingle wrote:

I had this happen on my edge firewall which is running a snapshot from earlier this week. The pf tables with entries managed by filterdns were empty when it failed, even entries that did not require DNS resolution. There were no log messages from filterdns until it was killed and restarted, at which point it was functioning as expected.

The filterdns process was idle and not doing anything. Running truss -fp on its PID showed no activity.

When it was stuck:
[...]
After killing it manually and restarting:
[...]

Checking it randomly after, the main process was always in the "sleep" state (Ss), I didn't see the main process in the idle state (Is) again, but if you show all threads (ps uxHaww | grep filterdns) the individual threads are in the idle state.

The I state indicates it's sleeping for over 20 seconds and per-se is not the problem because filterdns threads sleep for 1 minute so it will stay as S in the first 20 seconds and then move to I

#12 Updated by Renato Botelho about 1 year ago

  • Status changed from This Sprint to In Progress

#13 Updated by Steve Beaver about 1 year ago

  • Assignee deleted (Renato Botelho)
  • Target version changed from 2.4.4 to 48

Currently unable to reproduce

#14 Updated by Luke Hamburg about 1 year ago

I have definitely hit this one - yes it is hard to reproduce. But, if I hit it again, is it worth sending any sort of truss output or anything else to help track it down?

#15 Updated by Renato Botelho about 1 year ago

Luke Hamburg wrote:

I have definitely hit this one - yes it is hard to reproduce. But, if I hit it again, is it worth sending any sort of truss output or anything else to help track it down?

Can you patch your local system the places that start filterdns and replace '-d 1' by '-d 10' to increase debug level? then if it fails it would be good to check the logs from /var/log/resolver.log

#16 Updated by Luke Hamburg about 1 year ago

Ok, I've done that. For anyone else who wants an easy way, I made a patch that you can add via System Patches:
https://github.com/luckman212/pfsense/commit/72834bf677bdbd1cf78f6772b79abe4b3eaa8235

Don't forget to kill & restart filterdns after applying, or simply reboot.
To follow the resolver logs in realtime, you can use:

clog -f /var/log/resolver.log

Also, I don't see it linked here so I want to add, this seems very related to https://redmine.pfsense.org/issues/8001 which was reported on the forums here: https://forum.netgate.com/topic/124467/filterdns-stops-working

Might literally be the same issue.

#17 Updated by khaled osama 12 months ago

Dear All

i am affected with same problem
it happens every day approx.
i must kill filterdns service and restart to make worked again
i increased the debug level for filterdns
and i attached the resolver log file
best regards,

i linked the same post from netgate forum
https://forum.netgate.com/topic/124467/filterdns-stops-working/36

#18 Updated by Luiz Souza 12 months ago

  • Assignee set to Luiz Souza

#19 Updated by Luiz Souza 12 months ago

  • Status changed from In Progress to Feedback

This issue was one of fixes included in the new filterdns (version 2.0).

If you still have issues, please let us know.

Thanks!

#20 Updated by Luiz Souza 12 months ago

  • % Done changed from 0 to 100

#21 Updated by Renato Botelho 12 months ago

  • Target version changed from 48 to 2.4.4-p1

#22 Updated by Luiz Souza 11 months ago

  • Status changed from Feedback to Closed

#23 Updated by Rudolf Mayerhofer 9 months ago

Just ran into this on 2.4.4-p2 with a not updating alias table:

[2.4.4-RELEASE][root@fw2]/root: ps aux | grep filterdns
root 96457 0.0 0.0 49776 9336 - Is 10Jan19 0:10.67 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
root 2975 0.0 0.0 6564 2468 0 S+ 12:54 0:00.00 grep filterdns

After killall -9 filterdns and applying changes to an alias filterdns started working properly again:

[2.4.4-RELEASE][root@fw2]/root: ps aux | grep filterdns
root 36300 67.0 0.0 47984 8016 - Ss 12:54 0:00.01 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
root 39464 0.0 0.0 6564 2468 0 S+ 12:54 0:00.00 grep filterdns

Unfortunately, at this point in time I do not have any more debug output available. I will continue to monitor this and add more information as it will be available to me.

#24 Updated by Robert Gijsen 8 months ago

2.4.4-RELEASE-p2, I've had this multiple times. At the moment I can even sort of reproduce it.
When adding hosts to an alias my AD DNS server logs:

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine result gets send back to pfSense. However the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well. At first I though the issue was with hosts that are already in a table somewhere, but that doesn't seem to be the case. Most internal names I tried now don't end up in that table either.

pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one in the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: www.ict-net.nl

The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.

I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work. Then I added www.tweakers.net as another try, and that one gets in there immediately.
I again killed filterdns, restarted it and poof - the tables immediately got filled as they should. So it seems filterdns is partially functional - some hosts get added, some aren't.

Tell me what loggings you need. As it seems I can now reproduce this at will (also on my second carp / HA node by the way) I can probably give all needed logs.

#25 Updated by Jim Pingle 8 months ago

This is a closed/resolved issue. If you have problems with filterdns, they are likely already covered by #9296 -- add your notes there.

Also available in: Atom PDF