Project

General

Profile

Actions

Bug #5413

open

Incorrect Handling of Unbound Resolver [service restarts, cache loss, DNS service interruption]

Added by ky41083 - over 8 years ago. Updated 2 months ago.

Status:
Confirmed
Priority:
High
Category:
DNS Resolver
Target version:
Start date:
11/10/2015
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
All
Affected Architecture:

Description

The right way to handle local DNS changes, for Unbound at least, would basically be to do the opposite of what is being done now. Rather than write to the config files and bounce the service, you would use unbound-control to tell Unbound about the local DNS changes.

Discussion here: https://forum.pfsense.org/index.php?topic=89589.0
Full rough draft solution here: https://forum.pfsense.org/index.php?topic=89589.msg568043#msg568043

Quick and dirty rough draft summary... doubt code syntax is even completely right (if I had more time it would be, leave it up to who codes it), but this method is the only right one. The only other solution would be to remove Unbound completely and replace it with something else (please don't, it works very well when used correctly).

Functions like this:
$unbound_entries .= "local-data: \"{$host['fqdn']} {$type} {$host['ipaddr']}\"\n";

Should be changed to something like this:
$unbound_cmd .= "unbound-control local_data {$host['fqdn']} {$type} {$host['ipaddr']}";

And NEVER EVER bounce the Unbound service. Ever. It is completely unnecessary.

Initial service start / user initiated service restart should probably use the same unbound-control calls for managing all local DNS entries, to prevent both modifying Unbound config files and calling unbound-control to do the same exact thing. Plus it's cleaner, now we don't have 2 code paths to maintain (config files & unbound-control), and we don't use more RAM to store unneeded config file entries.

Additional implementation considerations can be found in the cited post above.


Files

resolver.log (500 KB) resolver.log Dmitriy K, 05/30/2017 03:30 PM

Related issues

Related to Bug #11553: Unbound does not restart properly sometimes when DHCP Registration is enabledDuplicate02/26/2021

Actions
Related to Bug #10624: Memory leak in Unbound with Python module and DHCP lease registration activeResolvedChristian McDonald

Actions
Related to Regression #11316: Unbound crashes with signal 11 when reloadingResolvedChristian McDonald

Actions
Actions #1

Updated by Chris Buechler over 8 years ago

  • Category set to DNS Resolver
  • Status changed from New to Confirmed
  • Priority changed from Normal to High
Actions #2

Updated by ky41083 - over 8 years ago

Can someone please add "DNS service interruption" to the portion of the title in brackets? It is also a main symptom, but I spaced on it when I created the issue. TY

Actions #3

Updated by David Wood over 8 years ago

I know a bug report is not really the place for arguing about the merits of a solution, but I respectfully maintain some of my caution in https://forum.pfsense.org/index.php?topic=89589.msg568394#msg568394 , especially in relation to ky41083's assertion that there is no need ever to restart Unbound.

Changes in local-data can be handled via unbound-control as ky41083 says - though the inability to remove just the A or AAAA record will likely require some care, as both can exist for the same local host and there is no guaranteed temporal relationship between changes in A and AAAA. In particular, DHCP and DHCPv6 are entirely separate and not synchronous.

Unlike ky41083, I cannot see any alternative to restarting Unbound if there are configuration changes made to Unbound beyond changes to local data, as SIGHUP and unbound-control reload unfortunately amount to a reload at present (i.e. cache flush and re-read of the configuration files). unbound-control does not allow for on-the-fly reconfiguration of all aspects of Unbound. This is why I suggested a diff based approach on the forums as one possibility.

These are, of course, implementation points. I agree that Unbound should be reconfigured on-the-fly whenever possible. In time, I hope that Unbound will get saner SIGHUP handling, but this will likely be a lot of work.

Unfortunately, I have no time to work on this issue at present.

Actions #4

Updated by Jim Thompson about 8 years ago

  • Subject changed from Incorrect Handling of Unbound Resolver [service restarts, cache loss] to Incorrect Handling of Unbound Resolver [service restarts, cache loss, DNS service interruption]
Actions #5

Updated by Jim Thompson about 8 years ago

  • Assignee set to Renato Botelho
Actions #6

Updated by robi robi about 8 years ago

I had to go back to DNS Forwarder (dnsmasq) because of this.
In my case, unbound was restarting itself every 2 seconds, and clients were complaining about no internet access.

Please provide a quick fix, what lines should be modified on a running system to restore unbound functionality.
Thank God dnsmasq it's still inthere and only a checkbox away.

Actions #7

Updated by Chris Buechler about 8 years ago

robi: your issue is almost certainly a different root cause, this issue just exacerbates it. Post a new thread on the forum describing what you're seeing, with your system logs from the time.

Actions #8

Updated by ky41083 - almost 8 years ago

No, arguing about the merits of a solution is not what Redmine bug reports are for, so I won't be, as it just makes more work for the people actually trying to fix the bug. I wouldn't normally even post the following here, but someone clearly felt the need to re-post issues here that are already posted and addressed in the forum.

Your concerns addressed, David:

https://forum.pfsense.org/index.php?topic=89589.msg568708#msg568708
https://forum.pfsense.org/index.php?topic=89589.msg607906#msg607906

I completely welcome any further discussion, in the forum. That place where people who "unfortunately, have no time to work on this issue at present" can constructively hash out finer implementation details.

Actions #9

Updated by Anonymous over 7 years ago

Is there any update on when this might get worked on? It has been almost a year now.

Actions #10

Updated by ky41083 - over 7 years ago

If the dev's won't / can't answer you, I will. Due to changes in 2.3 (I tested with 2.3.2p1), restarting of the Unbound service to pick up any live DNS changes has been rendered completely and wholly UNNECESSARY. My guess is nobody's figured this out yet because all the kill / restart Unbound code is still present and used.

All you have to do now (this didn't fully work on 2.2.6, requiring a reboot to pickup some live DNS changes, like static DHCP), is disable all the code points that restart Unbound, and poof, everything just works, without so much as poking Unbound itself. Yes, I have a patch. Yes, I will post it soon. I just updated from 2.2.6 a few days ago and am still getting things sorted...

Actions #11

Updated by BBcan177 . over 7 years ago

Some users have also reported issues with the Unbound Resolver and pfBlockerNG DNSBL. I am not able to reproduce, but some users are reporting that Unbound Fails to reload after IP Interface changes when DNSBL is enabled. I assume this is due to the fact that DNSBL adds an include file which might take a little longer to load:

server:include: /var/unbound/pfb_dnsbl.conf

The pfBlockerNG DNSBL Reload code here as reference:
https://github.com/pfsense/FreeBSD-ports/blob/devel/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc#L1402

Actions #12

Updated by ky41083 - over 7 years ago

Actions #13

Updated by ky41083 - over 7 years ago

BBcan177 . wrote:

Some users have also reported issues with the Unbound Resolver and pfBlockerNG DNSBL. I am not able to reproduce, but some users are reporting that Unbound Fails to reload after IP Interface changes when DNSBL is enabled. I assume this is due to the fact that DNSBL adds an include file which might take a little longer to load:

server:include: /var/unbound/pfb_dnsbl.conf

The pfBlockerNG DNSBL Reload code here as reference:
https://github.com/pfsense/FreeBSD-ports/blob/devel/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc#L1402

I would guess, that the Unbound full cache dump / load process, adds FAR more delay, than the additional include file ever could. Just using the GUI options for Unbound, you can easily end up with a multi-gigabyte DNS cache. The pfSenseless dev's have discussed this as a way to keep the Unbound cache when reloading it, and even they know better than to do this.

pfBlockerNG DNSBL, should absolutely not be doing it either. You should open a bug citing the poor cache handling choice.

Actions #14

Updated by Anonymous over 7 years ago

The patch you posted only prevents Unbound from being restarted by performing GUI actions, not automatically when a new DHCP lease is granted. This is because Unbound is restarted directly by "dhcpleases" in that case. If the "dhcpleases" configuration is modified such that it does not restart Unbound, Unbound never picks up any changes to its configuration.

Actions #15

Updated by ky41083 - over 7 years ago

With the patch above applied, and "Register DHCP leases in the DNS Resolver" enabled, the Unbound service does not restart. Ever. Verified by following:

Via SSH:

unbound-control -c /var/unbound/unbound.conf stats_noreset

--- snip ---
time.now=1480555628.497528
time.up=683456.981686
time.elapsed=683456.981686
--- snip ---

Dashboard displays:
Uptime 7 Days 21 Hours 54 Minutes 12 Seconds

All DHCP leases are fully DNS resolvable via Unbound, regardless of if they are new as of 7 seconds or 7 days ago.

Any questions?

Actions #16

Updated by ky41083 - over 7 years ago

Michael Marley wrote:

Unbound is restarted directly by "dhcpleases"

Please post a Github link to the file + line you are referring to.

Everything I've looked at, and all behavior I am seeing, says anything starting / stopping Unbound, works by calling functions in either services.inc or unbound.inc.

Actions #17

Updated by Anonymous almost 7 years ago

Hi all

I'm facing same issue on our pfSense boxes.

We're using unbound and configured dhcp server to update unbound.
Each time a new devise, workstation, laptop, smartphone or tablet request an IP, unbound is restarted.
All our systems are using pfSense unbound and quite frequently Continuous Integration jobs failed when they try to resolv a name.

What about including ky41083 patch ?

Actions #18

Updated by Dmitriy K almost 7 years ago

Sadly, I've faced the same problem with Unbound. This issue forced me to use RAM disks. I hope there will be a fix in near future.

Actions #19

Updated by Jason NA over 4 years ago

Is there any update here? Apparently there has been a fix available for over 2 years?

Actions #20

Updated by Nick B about 4 years ago

This is a big problem when using pfblockerng and also registering DHCP leases in the resolver as it causes unbound to reload frequently and because of the large pfblockerng database unbound can take ~20 seconds to reload causing disruptions for all clients. Can this patch here get included?

Actions #21

Updated by Jim Pingle about 4 years ago

There is no patch here to apply. There are some general theories and wishes, but no code. If someone wants to take it on, we'd be more than happy to review a pull request to make the recommended changes.

Actions #22

Updated by Alexander Berkes about 4 years ago

Hi all,

I have been looking at this issue for the last few days, because I am affected by myself and would like this annoying bug to be fixed.

One of the biggest problems is dhcpleases sending a SIGHUP to the unbound process instead of using unbound-control to update the dhcp leases state.
This is especially noticed when unbound is used in conjunction with pfBlockerNG and huge dns-blocklists are in use.

Call me a google noob, but somehow I couldn't find anything else regarding source-code of dhcpleases than:
[[https://github.com/unexpectedBy/pfsense-tools/blob/master/pfPorts/dhcpleases/files/dhcpleases.c]]

Sadly, this is not the current version of dhcpleases.

Anyhow, I took the above mentioned code of dhcpleases as a basis and wrote a diff based solution with unbound-control.
First tests show me that it is basically doing what it should.

Could anybody toss me towards the right direction if there is any repository containing the latest version of dhcpleases?

Regards

Actions #23

Updated by Renato Botelho about 4 years ago

Alexander Berkes wrote:

Hi all,

I have been looking at this issue for the last few days, because I am affected by myself and would like this annoying bug to be fixed.

One of the biggest problems is dhcpleases sending a SIGHUP to the unbound process instead of using unbound-control to update the dhcp leases state.
This is especially noticed when unbound is used in conjunction with pfBlockerNG and huge dns-blocklists are in use.

Call me a google noob, but somehow I couldn't find anything else regarding source-code of dhcpleases than:
[[https://github.com/unexpectedBy/pfsense-tools/blob/master/pfPorts/dhcpleases/files/dhcpleases.c]]

Sadly, this is not the current version of dhcpleases.

Anyhow, I took the above mentioned code of dhcpleases as a basis and wrote a diff based solution with unbound-control.
First tests show me that it is basically doing what it should.

Could anybody toss me towards the right direction if there is any repository containing the latest version of dhcpleases?

Regards

https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c

Actions #24

Updated by Alexander Berkes about 4 years ago

Renato Botelho wrote:

https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c

Thanks for the link.

I made a fork and added my changes:
[[https://github.com/n3bul4/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c]]

First of all, this is just a proof of concept. I only made a few tests so far, which look good to me. Any help is appreciated.

The way this works is by making a diff (with the standard diff command) between the current dhcp.leases file and the corresponding unbound view (called dhcpleases).
Obsolete DNS entries get removed and new ones are added (via unbound-control).

For this to work, a few changes to unbound.conf are needed:

1.) unbound needs to be configured with a view named "dhcpleases"

view:
    name: dhcpleases

2.) every interface, where clients should "see" the hostnames needs to have an access-control-view directive:

server:
    access-control-view: 192.168.12.0/24 dhcpleases
    access-control-view: 192.168.13.0/24 dhcpleases
    access-control-view: 192.168.14.0/24 dhcpleases
    .
    .
    .

The above would add access for all clients in the networks 192.168.12.0-192.168.14.0 with a netmask 255.255.255.0.

3.) The include directive for dhcpleases_entries.conf in unbound.conf has to be removed / commented out

# dhcp lease entries
#include: /var/unbound/dhcpleases_entries.conf

For a quick shot, points 1 and 2 can be added to the "Custom options" section in the pfsense DNS Resolver configuration webgui.
For point 3 to work, one would have to edit /etc/inc/unbound.inc. Function unbound_generate_config_text.
Simply search for "# dhcp lease entries" and comment out the line below as shown in point 3.
Compile dhcpleases, copy it to your pfsense box.
Make a backup of /usr/local/sbin/dhcpleases.
Kill running dhcpleases process.
Move compiled dhcpleases version to /usr/local/sbin/dhcpleases.
Restart unbound.

Final notes:

The code creates 2 files in /tmp:
/tmp/dhcpleases.sort
/tmp/dhcpleases.diff

New DNS entries are added with a bulk operation (view_local_datas), whereas delete operations are done one by one.
This is due to lack of a bulk remove operation from views in unbound-control, which would be a nice feature request.
So keep in mind, that DNS delete operations will take longer than adding DNS entries.

dhcpleases logs information about addded / removed entries to system.log

Any thoughts and help are welcome!

Actions #25

Updated by Renato Botelho about 4 years ago

Alexander Berkes wrote:

Renato Botelho wrote:

https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c

Thanks for the link.

I made a fork and added my changes:
[[https://github.com/n3bul4/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c]]

First of all, this is just a prove of concept. I only made a few tests so far, which look good to me. Any help is appreciated.

The way this works is by making a diff (with the standard diff command) between the current dhcp.leases file and the corresponding unbound view (called dhcpleases).
Obsolete DNS entries get removed and new ones are added (via unbound-control).

For this to work, a few changes to unbound.conf are needed:

1.) unbound needs to be configured with a view named "dhcpleases"

[...]

2.) every interface, where clients should "see" the hostnames needs to have an access-control-view directive:

[...]

The above would add access for all clients in the networks 192.168.12.0-192.168.14.0 with a netmask 255.255.255.0.

3.) The include directive for dhcpleases_entries.conf in unbound.conf has to be removed / commented out
[...]

For a quick shot, points 1 and 2 can be added to the "Custom options" section in the pfsense DNS Resolver configuration webgui.
For point 3 to work, one would have to edit /etc/inc/unbound.inc. Function unbound_generate_config_text.
Simply search for "# dhcp lease entries" and comment out the line below as shown in point 3.
Compile dhcpleases, copy it to your pfsense box.
Make a backup of /usr/local/sbin/dhcpleases.
Kill running dhcpleases process.
Move compiled dhcpleases version to /usr/local/sbin/dhcpleases.
Restart unbound.

Final notes:

The code creates 2 files in /tmp:
/tmp/dhcpleases.sort
/tmp/dhcpleases.diff

New DNS entries are added with a bulk operation (view_local_datas), whereas delete operations are done one by one.
This is due to lack of a bulk remove operation from views in unbound-control, which would be a nice feature request.
So keep in mind, that DNS delete operations will take longer than adding DNS entries.

dhcpleases logs information about addded / removed entries to system.log

Any thoughts and help are welcome!

please submit the proposed changes as a Pull Request against FreeBSD-ports repository - https://docs.netgate.com/pfsense/en/latest/development/submitting-a-pull-request-via-github.html

This way developers can review what you have changed

Actions #27

Updated by Brittney Lars almost 4 years ago

How are other people dealing with this issue or working around it? For me it's causing such frequent internet outages (dns outages are internet outages for clients) on my network that I am starting to consider looking for an alternative.

Actions #28

Updated by Nick B almost 4 years ago

Brittney Lars wrote:

How are other people dealing with this issue or working around it? For me it's causing such frequent internet outages (dns outages are internet outages for clients) on my network that I am starting to consider looking for an alternative.

I have disabled registering DHCP leases in the resolver. Even though I rely on that feature I'd rather not have frequent DNS interruptions than local hostname resolutions.

Actions #29

Updated by Brittney Lars almost 4 years ago

Thanks @Nick B this workaround works well.

Actions #30

Updated by Raffi T over 3 years ago

Thanks for all the work on this. It seems like there is progress made on that git page, but the status of this bug is still 0%. Is that correct? What's the status of the code review?

Thanks,
Raffi

Actions #31

Updated by Jim Pingle about 3 years ago

  • Related to Bug #11553: Unbound does not restart properly sometimes when DHCP Registration is enabled added
Actions #32

Updated by Jim Pingle about 3 years ago

  • Related to Bug #10624: Memory leak in Unbound with Python module and DHCP lease registration active added
Actions #33

Updated by Jim Pingle about 3 years ago

Actions #34

Updated by Jim Pingle about 3 years ago

  • Target version set to CE-Next
Actions #35

Updated by Jesse Adelman over 2 years ago

Howdy. Netgate customer here. Hoping that this 'high priority' 6 year old bug gets some love from Netgate-employed devs, or the excellent volunteer devs of pfSense. While the workarounds are nice to see, I think having to not have to worry about a unnecessary HUP and restart of Unbound would be great, especially as I don't want to degrade other services functionality just to implement the workarounds. Thanks. Cheers.

Actions #36

Updated by steven warner about 2 years ago

Jiggling the handle on this one again. Just tracked another user complaint down to this issue - the outtage that occurs during the restart. Would really like to see this fixed.

Actions #37

Updated by Dennis Adler almost 2 years ago

Curious Netgate customer wondering if the fix posted by Alexander Berkes 2 years ago (or any other fix) is in the works? I see Jim set some change to CE-Next a year ago, but this is still a problem with Unbound in Python Mode + pfBlockerNG. Would be nice to know current status, thanks!

Actions #38

Updated by R D almost 2 years ago

Paying Netgate customer here. Am actively running into this problem (showing in the form of periodic DNS resolution errors that are causing issues) and would love to see this fixed!

Actions #39

Updated by Renato Botelho almost 2 years ago

  • Assignee deleted (Renato Botelho)
Actions #40

Updated by Dennis Adler almost 2 years ago

Hey Netgate,

What happened to this fix... I see that the 22.05 beta is out and this bug is still set to CE-NEXT and without an owner. From this bug log it looks like the proposed fix has just been sitting for 2 years or so. CAN YOU PLEASE let us know the status???? Thanks!

Actions #41

Updated by O E almost 2 years ago

Hey Netgate - I get the feeling this affects far more customers than you think.
Can this be assigned to someone to actually look towards a solution??

Actions #42

Updated by Vaidotas Butkus over 1 year ago

Any progress on this as it causes lots of other DNS resolver issues not just short interruptions.
22.01 dns resolver worked just fine after upgrade to 22.05 it started breaking again. not to mention the bad things that were in 2.5CE and needed to downgrade unbound version.

Actions #43

Updated by Per-Arne Hellarvik over 1 year ago

Netgate 3100 user here, running 22.05, upgraded from 22.01 - Same problem: DNS interruptions. Can this issue get some love from devs?

Actions #44

Updated by Dennis Adler over 1 year ago

Hello Netgate Folk,

What if you created a version with this fix that could be applied with the Patch tool? I know I'd be happy to help test it (after September 6, when my current project wraps)!!

What say you??

Actions #45

Updated by Jeff Kuehl over 1 year ago

I agree, I'll test too

Actions #46

Updated by David Reitz over 1 year ago

Jeff Kuehl wrote in #note-45:

I agree, I'll test too

Count me in as well. I'd be happy to test a patch!

Actions #47

Updated by Christian McDonald about 1 year ago

  • Assignee set to Christian McDonald

Taking this one on as I'm now quite familiar with Unbound in pfSense

Actions #48

Updated by Michael Kolassa about 1 year ago

Christian McDonald wrote in #note-47:

Taking this one on as I'm now quite familiar with Unbound in pfSense

Sorry for the ping, but given how close we potentially are to the next pfSense+ release, have you had any luck with the dhcpleases/unbound issue that exists with the "Register DHCP leases in the DNS Resolver" option checked?

I would love to re-enable this feature, but the frequent complete DNS outages until I manually restart unbound is quite annoying. For the record, I don't run pfBlockerNG, so there are no 'large lists' that would be causing overhead.

Actions #49

Updated by Christian McDonald about 1 year ago

Unbound reloads are faster now when Python mode is enabled. I eliminated the expensive task of reloading the entire python interpreter every time Unbound is reloaded.

I am also exploring the new cache retention option introduced in 1.7.1 which we are now on.

Beyond that, I'm still actively working on this and weighing the pros and cons of the next steps. It's no secret that ISC DHCPD is dead. I'm exploring Kea as well. Kea has a muh better extension model that makes integrations a bit cleaner (and can run in-proces instead of requiring a sidecar daemon like dhcpleases).

Unbound... Unbound was never designed be an authoritative resolver, but we use it as such for the local domain. A better model might be to run a separate internal DNS server that supports DDNS out of the box and just forward these lookups thus eliminating the need for a custom integration and service reloads etc.

It's an ongoing problem that I'm working on.

Actions #50

Updated by Michael Kolassa about 1 year ago

Christian McDonald wrote in #note-49:

Unbound reloads are faster now when Python mode is enabled. I eliminated the expensive task of reloading the entire python interpreter every time Unbound is reloaded.

I am also exploring the new cache retention option introduced in 1.7.1 which we are now on.

Beyond that, I'm still actively working on this and weighing the pros and cons of the next steps. It's no secret that ISC DHCPD is dead. I'm exploring Kea as well. Kea has a muh better extension model that makes integrations a bit cleaner (and can run in-proces instead of requiring a sidecar daemon like dhcpleases).

Unbound... Unbound was never designed be an authoritative resolver, but we use it as such for the local domain. A better model might be to run a separate internal DNS server that supports DDNS out of the box and just forward these lookups thus eliminating the need for a custom integration and service reloads etc.

It's an ongoing problem that I'm working on.

Thank you for the update and all of your hard work so far! This ticket has been a thorn in an otherwise phenomenal product, and it's great to see someone really deep diving into it after all these years.

Actions #51

Updated by Allistah F about 1 year ago

Hi there, I just wanted to say thanks for all the time and work that is going into this fix. It's really a problem when there are so many devices on the network and unbound is being constantly restarted - it causes a lot of delays that really shouldn't happen. Does anyone have any idea when this would be fixed?

Actions #52

Updated by Mark Abram 9 months ago

I had high hopes that we may see the fix in the latest version (23.05). Do we have a road map or at least a time frame when we can expect this to be resolved. As it is now, any time a DHCP client connects or reconnects to the network DNS Resolver is killed and restarted.

Actions #53

Updated by Christian McDonald 9 months ago

It is actively being worked on.

Actions #54

Updated by Dennis Adler 5 months ago

Hello Christian, any updates on your progress? Thanks!

Actions #55

Updated by Dennis Adler 2 months ago

A question for you, Christian. Does the DHCP change to KEA's code mean this is no longer a problem? Or are the notification methods to Unbound still going to cause crashes? If you think it has fixed this, I am willing to put the time in to test it out. Let me know!

Actions

Also available in: Atom PDF