Bug #5413
closedReduce disruptions when changing DNS records from DHCP leases in Unbound
Added by ky41083 - about 9 years ago. Updated about 1 month ago.
100%
Description
The right way to handle local DNS changes, for Unbound at least, would basically be to do the opposite of what is being done now. Rather than write to the config files and bounce the service, you would use unbound-control to tell Unbound about the local DNS changes.
Discussion here: https://forum.pfsense.org/index.php?topic=89589.0
Full rough draft solution here: https://forum.pfsense.org/index.php?topic=89589.msg568043#msg568043
Quick and dirty rough draft summary... doubt code syntax is even completely right (if I had more time it would be, leave it up to who codes it), but this method is the only right one. The only other solution would be to remove Unbound completely and replace it with something else (please don't, it works very well when used correctly).
Functions like this:
$unbound_entries .= "local-data: \"{$host['fqdn']} {$type} {$host['ipaddr']}\"\n";
Should be changed to something like this:
$unbound_cmd .= "unbound-control local_data {$host['fqdn']} {$type} {$host['ipaddr']}";
And NEVER EVER bounce the Unbound service. Ever. It is completely unnecessary.
Initial service start / user initiated service restart should probably use the same unbound-control calls for managing all local DNS entries, to prevent both modifying Unbound config files and calling unbound-control to do the same exact thing. Plus it's cleaner, now we don't have 2 code paths to maintain (config files & unbound-control), and we don't use more RAM to store unneeded config file entries.
Additional implementation considerations can be found in the cited post above.
Files
resolver.log (500 KB) resolver.log | Dmitriy K, 05/30/2017 03:30 PM |
Related issues
Updated by Chris Buechler about 9 years ago
- Category set to DNS Resolver
- Status changed from New to Confirmed
- Priority changed from Normal to High
Updated by ky41083 - about 9 years ago
Can someone please add "DNS service interruption" to the portion of the title in brackets? It is also a main symptom, but I spaced on it when I created the issue. TY
Updated by David Wood almost 9 years ago
I know a bug report is not really the place for arguing about the merits of a solution, but I respectfully maintain some of my caution in https://forum.pfsense.org/index.php?topic=89589.msg568394#msg568394 , especially in relation to ky41083's assertion that there is no need ever to restart Unbound.
Changes in local-data can be handled via unbound-control as ky41083 says - though the inability to remove just the A or AAAA record will likely require some care, as both can exist for the same local host and there is no guaranteed temporal relationship between changes in A and AAAA. In particular, DHCP and DHCPv6 are entirely separate and not synchronous.
Unlike ky41083, I cannot see any alternative to restarting Unbound if there are configuration changes made to Unbound beyond changes to local data, as SIGHUP and unbound-control reload unfortunately amount to a reload at present (i.e. cache flush and re-read of the configuration files). unbound-control does not allow for on-the-fly reconfiguration of all aspects of Unbound. This is why I suggested a diff based approach on the forums as one possibility.
These are, of course, implementation points. I agree that Unbound should be reconfigured on-the-fly whenever possible. In time, I hope that Unbound will get saner SIGHUP handling, but this will likely be a lot of work.
Unfortunately, I have no time to work on this issue at present.
Updated by Jim Thompson almost 9 years ago
- Subject changed from Incorrect Handling of Unbound Resolver [service restarts, cache loss] to Incorrect Handling of Unbound Resolver [service restarts, cache loss, DNS service interruption]
Updated by robi robi over 8 years ago
I had to go back to DNS Forwarder (dnsmasq) because of this.
In my case, unbound was restarting itself every 2 seconds, and clients were complaining about no internet access.
Please provide a quick fix, what lines should be modified on a running system to restore unbound functionality.
Thank God dnsmasq it's still inthere and only a checkbox away.
Updated by Chris Buechler over 8 years ago
robi: your issue is almost certainly a different root cause, this issue just exacerbates it. Post a new thread on the forum describing what you're seeing, with your system logs from the time.
Updated by ky41083 - over 8 years ago
No, arguing about the merits of a solution is not what Redmine bug reports are for, so I won't be, as it just makes more work for the people actually trying to fix the bug. I wouldn't normally even post the following here, but someone clearly felt the need to re-post issues here that are already posted and addressed in the forum.
Your concerns addressed, David:
https://forum.pfsense.org/index.php?topic=89589.msg568708#msg568708
https://forum.pfsense.org/index.php?topic=89589.msg607906#msg607906
I completely welcome any further discussion, in the forum. That place where people who "unfortunately, have no time to work on this issue at present" can constructively hash out finer implementation details.
Updated by Anonymous about 8 years ago
Is there any update on when this might get worked on? It has been almost a year now.
Updated by ky41083 - about 8 years ago
If the dev's won't / can't answer you, I will. Due to changes in 2.3 (I tested with 2.3.2p1), restarting of the Unbound service to pick up any live DNS changes has been rendered completely and wholly UNNECESSARY. My guess is nobody's figured this out yet because all the kill / restart Unbound code is still present and used.
All you have to do now (this didn't fully work on 2.2.6, requiring a reboot to pickup some live DNS changes, like static DHCP), is disable all the code points that restart Unbound, and poof, everything just works, without so much as poking Unbound itself. Yes, I have a patch. Yes, I will post it soon. I just updated from 2.2.6 a few days ago and am still getting things sorted...
Updated by BBcan177 . about 8 years ago
Some users have also reported issues with the Unbound Resolver and pfBlockerNG DNSBL. I am not able to reproduce, but some users are reporting that Unbound Fails to reload after IP Interface changes when DNSBL is enabled. I assume this is due to the fact that DNSBL adds an include file which might take a little longer to load:
server:include: /var/unbound/pfb_dnsbl.conf
The pfBlockerNG DNSBL Reload code here as reference:
https://github.com/pfsense/FreeBSD-ports/blob/devel/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc#L1402
Updated by ky41083 - about 8 years ago
Patch Posted: https://forum.pfsense.org/index.php?topic=119467.0
Updated by ky41083 - about 8 years ago
BBcan177 . wrote:
Some users have also reported issues with the Unbound Resolver and pfBlockerNG DNSBL. I am not able to reproduce, but some users are reporting that Unbound Fails to reload after IP Interface changes when DNSBL is enabled. I assume this is due to the fact that DNSBL adds an include file which might take a little longer to load:
server:include: /var/unbound/pfb_dnsbl.conf
The pfBlockerNG DNSBL Reload code here as reference:
https://github.com/pfsense/FreeBSD-ports/blob/devel/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc#L1402
I would guess, that the Unbound full cache dump / load process, adds FAR more delay, than the additional include file ever could. Just using the GUI options for Unbound, you can easily end up with a multi-gigabyte DNS cache. The pfSenseless dev's have discussed this as a way to keep the Unbound cache when reloading it, and even they know better than to do this.
pfBlockerNG DNSBL, should absolutely not be doing it either. You should open a bug citing the poor cache handling choice.
Updated by Anonymous about 8 years ago
The patch you posted only prevents Unbound from being restarted by performing GUI actions, not automatically when a new DHCP lease is granted. This is because Unbound is restarted directly by "dhcpleases" in that case. If the "dhcpleases" configuration is modified such that it does not restart Unbound, Unbound never picks up any changes to its configuration.
Updated by ky41083 - almost 8 years ago
With the patch above applied, and "Register DHCP leases in the DNS Resolver" enabled, the Unbound service does not restart. Ever. Verified by following:
Via SSH:
unbound-control -c /var/unbound/unbound.conf stats_noreset
--- snip ---
time.now=1480555628.497528
time.up=683456.981686
time.elapsed=683456.981686
--- snip ---
Dashboard displays:
Uptime 7 Days 21 Hours 54 Minutes 12 Seconds
All DHCP leases are fully DNS resolvable via Unbound, regardless of if they are new as of 7 seconds or 7 days ago.
Any questions?
Updated by ky41083 - almost 8 years ago
Michael Marley wrote:
Unbound is restarted directly by "dhcpleases"
Please post a Github link to the file + line you are referring to.
Everything I've looked at, and all behavior I am seeing, says anything starting / stopping Unbound, works by calling functions in either services.inc or unbound.inc.
Updated by Anonymous over 7 years ago
Hi all
I'm facing same issue on our pfSense boxes.
We're using unbound and configured dhcp server to update unbound.
Each time a new devise, workstation, laptop, smartphone or tablet request an IP, unbound is restarted.
All our systems are using pfSense unbound and quite frequently Continuous Integration jobs failed when they try to resolv a name.
What about including ky41083 patch ?
Updated by Dmitriy K over 7 years ago
- File resolver.log resolver.log added
Sadly, I've faced the same problem with Unbound. This issue forced me to use RAM disks. I hope there will be a fix in near future.
Updated by Jason NA almost 5 years ago
Is there any update here? Apparently there has been a fix available for over 2 years?
Updated by Nick B almost 5 years ago
This is a big problem when using pfblockerng and also registering DHCP leases in the resolver as it causes unbound to reload frequently and because of the large pfblockerng database unbound can take ~20 seconds to reload causing disruptions for all clients. Can this patch here get included?
Updated by Jim Pingle almost 5 years ago
There is no patch here to apply. There are some general theories and wishes, but no code. If someone wants to take it on, we'd be more than happy to review a pull request to make the recommended changes.
Updated by Alexander Berkes almost 5 years ago
Hi all,
I have been looking at this issue for the last few days, because I am affected by myself and would like this annoying bug to be fixed.
One of the biggest problems is dhcpleases sending a SIGHUP to the unbound process instead of using unbound-control to update the dhcp leases state.
This is especially noticed when unbound is used in conjunction with pfBlockerNG and huge dns-blocklists are in use.
Call me a google noob, but somehow I couldn't find anything else regarding source-code of dhcpleases than:
[[https://github.com/unexpectedBy/pfsense-tools/blob/master/pfPorts/dhcpleases/files/dhcpleases.c]]
Sadly, this is not the current version of dhcpleases.
Anyhow, I took the above mentioned code of dhcpleases as a basis and wrote a diff based solution with unbound-control.
First tests show me that it is basically doing what it should.
Could anybody toss me towards the right direction if there is any repository containing the latest version of dhcpleases?
Regards
Updated by Renato Botelho almost 5 years ago
Alexander Berkes wrote:
Hi all,
I have been looking at this issue for the last few days, because I am affected by myself and would like this annoying bug to be fixed.
One of the biggest problems is dhcpleases sending a SIGHUP to the unbound process instead of using unbound-control to update the dhcp leases state.
This is especially noticed when unbound is used in conjunction with pfBlockerNG and huge dns-blocklists are in use.Call me a google noob, but somehow I couldn't find anything else regarding source-code of dhcpleases than:
[[https://github.com/unexpectedBy/pfsense-tools/blob/master/pfPorts/dhcpleases/files/dhcpleases.c]]Sadly, this is not the current version of dhcpleases.
Anyhow, I took the above mentioned code of dhcpleases as a basis and wrote a diff based solution with unbound-control.
First tests show me that it is basically doing what it should.Could anybody toss me towards the right direction if there is any repository containing the latest version of dhcpleases?
Regards
https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c
Updated by Alexander Berkes almost 5 years ago
Renato Botelho wrote:
https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c
Thanks for the link.
I made a fork and added my changes:
[[https://github.com/n3bul4/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c]]
First of all, this is just a proof of concept. I only made a few tests so far, which look good to me. Any help is appreciated.
The way this works is by making a diff (with the standard diff command) between the current dhcp.leases file and the corresponding unbound view (called dhcpleases).
Obsolete DNS entries get removed and new ones are added (via unbound-control).
For this to work, a few changes to unbound.conf are needed:
1.) unbound needs to be configured with a view named "dhcpleases"
view:
name: dhcpleases
2.) every interface, where clients should "see" the hostnames needs to have an access-control-view directive:
server:
access-control-view: 192.168.12.0/24 dhcpleases
access-control-view: 192.168.13.0/24 dhcpleases
access-control-view: 192.168.14.0/24 dhcpleases
.
.
.
The above would add access for all clients in the networks 192.168.12.0-192.168.14.0 with a netmask 255.255.255.0.
3.) The include directive for dhcpleases_entries.conf in unbound.conf has to be removed / commented out
# dhcp lease entries
#include: /var/unbound/dhcpleases_entries.conf
For a quick shot, points 1 and 2 can be added to the "Custom options" section in the pfsense DNS Resolver configuration webgui.
For point 3 to work, one would have to edit /etc/inc/unbound.inc. Function unbound_generate_config_text.
Simply search for "# dhcp lease entries" and comment out the line below as shown in point 3.
Compile dhcpleases, copy it to your pfsense box.
Make a backup of /usr/local/sbin/dhcpleases.
Kill running dhcpleases process.
Move compiled dhcpleases version to /usr/local/sbin/dhcpleases.
Restart unbound.
Final notes:
The code creates 2 files in /tmp:
/tmp/dhcpleases.sort
/tmp/dhcpleases.diff
New DNS entries are added with a bulk operation (view_local_datas), whereas delete operations are done one by one.
This is due to lack of a bulk remove operation from views in unbound-control, which would be a nice feature request.
So keep in mind, that DNS delete operations will take longer than adding DNS entries.
dhcpleases logs information about addded / removed entries to system.log
Any thoughts and help are welcome!
Updated by Renato Botelho almost 5 years ago
Alexander Berkes wrote:
Renato Botelho wrote:
https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c
Thanks for the link.
I made a fork and added my changes:
[[https://github.com/n3bul4/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c]]First of all, this is just a prove of concept. I only made a few tests so far, which look good to me. Any help is appreciated.
The way this works is by making a diff (with the standard diff command) between the current dhcp.leases file and the corresponding unbound view (called dhcpleases).
Obsolete DNS entries get removed and new ones are added (via unbound-control).For this to work, a few changes to unbound.conf are needed:
1.) unbound needs to be configured with a view named "dhcpleases"
[...]
2.) every interface, where clients should "see" the hostnames needs to have an access-control-view directive:
[...]
The above would add access for all clients in the networks 192.168.12.0-192.168.14.0 with a netmask 255.255.255.0.
3.) The include directive for dhcpleases_entries.conf in unbound.conf has to be removed / commented out
[...]For a quick shot, points 1 and 2 can be added to the "Custom options" section in the pfsense DNS Resolver configuration webgui.
For point 3 to work, one would have to edit /etc/inc/unbound.inc. Function unbound_generate_config_text.
Simply search for "# dhcp lease entries" and comment out the line below as shown in point 3.
Compile dhcpleases, copy it to your pfsense box.
Make a backup of /usr/local/sbin/dhcpleases.
Kill running dhcpleases process.
Move compiled dhcpleases version to /usr/local/sbin/dhcpleases.
Restart unbound.Final notes:
The code creates 2 files in /tmp:
/tmp/dhcpleases.sort
/tmp/dhcpleases.diffNew DNS entries are added with a bulk operation (view_local_datas), whereas delete operations are done one by one.
This is due to lack of a bulk remove operation from views in unbound-control, which would be a nice feature request.
So keep in mind, that DNS delete operations will take longer than adding DNS entries.dhcpleases logs information about addded / removed entries to system.log
Any thoughts and help are welcome!
please submit the proposed changes as a Pull Request against FreeBSD-ports repository - https://docs.netgate.com/pfsense/en/latest/development/submitting-a-pull-request-via-github.html
This way developers can review what you have changed
Updated by Alexander Berkes almost 5 years ago
Updated by Brittney Lars over 4 years ago
How are other people dealing with this issue or working around it? For me it's causing such frequent internet outages (dns outages are internet outages for clients) on my network that I am starting to consider looking for an alternative.
Updated by Nick B over 4 years ago
Brittney Lars wrote:
How are other people dealing with this issue or working around it? For me it's causing such frequent internet outages (dns outages are internet outages for clients) on my network that I am starting to consider looking for an alternative.
I have disabled registering DHCP leases in the resolver. Even though I rely on that feature I'd rather not have frequent DNS interruptions than local hostname resolutions.
Updated by Brittney Lars over 4 years ago
Thanks @Nick B this workaround works well.
Updated by Raffi T almost 4 years ago
Thanks for all the work on this. It seems like there is progress made on that git page, but the status of this bug is still 0%. Is that correct? What's the status of the code review?
Thanks,
Raffi
Updated by Jim Pingle over 3 years ago
- Related to Bug #11553: Unbound does not restart properly sometimes when DHCP Registration is enabled added
Updated by Jim Pingle over 3 years ago
- Related to Bug #10624: Memory leak in Unbound with Python module and DHCP lease registration active added
Updated by Jim Pingle over 3 years ago
- Related to Regression #11316: Unbound crashes with signal 11 when reloading added
Updated by Jesse Adelman almost 3 years ago
Howdy. Netgate customer here. Hoping that this 'high priority' 6 year old bug gets some love from Netgate-employed devs, or the excellent volunteer devs of pfSense. While the workarounds are nice to see, I think having to not have to worry about a unnecessary HUP and restart of Unbound would be great, especially as I don't want to degrade other services functionality just to implement the workarounds. Thanks. Cheers.
Updated by steven warner almost 3 years ago
Jiggling the handle on this one again. Just tracked another user complaint down to this issue - the outtage that occurs during the restart. Would really like to see this fixed.
Updated by Dennis Adler over 2 years ago
Curious Netgate customer wondering if the fix posted by Alexander Berkes 2 years ago (or any other fix) is in the works? I see Jim set some change to CE-Next a year ago, but this is still a problem with Unbound in Python Mode + pfBlockerNG. Would be nice to know current status, thanks!
Updated by R D over 2 years ago
Paying Netgate customer here. Am actively running into this problem (showing in the form of periodic DNS resolution errors that are causing issues) and would love to see this fixed!
Updated by Dennis Adler over 2 years ago
Hey Netgate,
What happened to this fix... I see that the 22.05 beta is out and this bug is still set to CE-NEXT and without an owner. From this bug log it looks like the proposed fix has just been sitting for 2 years or so. CAN YOU PLEASE let us know the status???? Thanks!
Updated by O E over 2 years ago
Hey Netgate - I get the feeling this affects far more customers than you think.
Can this be assigned to someone to actually look towards a solution??
Updated by Vaidotas Butkus over 2 years ago
Any progress on this as it causes lots of other DNS resolver issues not just short interruptions.
22.01 dns resolver worked just fine after upgrade to 22.05 it started breaking again. not to mention the bad things that were in 2.5CE and needed to downgrade unbound version.
Updated by Per-Arne Hellarvik over 2 years ago
Netgate 3100 user here, running 22.05, upgraded from 22.01 - Same problem: DNS interruptions. Can this issue get some love from devs?
Updated by Dennis Adler over 2 years ago
Hello Netgate Folk,
What if you created a version with this fix that could be applied with the Patch tool? I know I'd be happy to help test it (after September 6, when my current project wraps)!!
What say you??
Updated by David Reitz almost 2 years ago
Jeff Kuehl wrote in #note-45:
I agree, I'll test too
Count me in as well. I'd be happy to test a patch!
Updated by Christian McDonald almost 2 years ago
- Assignee set to Christian McDonald
Taking this one on as I'm now quite familiar with Unbound in pfSense
Updated by Michael Kolassa almost 2 years ago
Christian McDonald wrote in #note-47:
Taking this one on as I'm now quite familiar with Unbound in pfSense
Sorry for the ping, but given how close we potentially are to the next pfSense+ release, have you had any luck with the dhcpleases/unbound issue that exists with the "Register DHCP leases in the DNS Resolver" option checked?
I would love to re-enable this feature, but the frequent complete DNS outages until I manually restart unbound is quite annoying. For the record, I don't run pfBlockerNG, so there are no 'large lists' that would be causing overhead.
Updated by Christian McDonald almost 2 years ago
Unbound reloads are faster now when Python mode is enabled. I eliminated the expensive task of reloading the entire python interpreter every time Unbound is reloaded.
I am also exploring the new cache retention option introduced in 1.7.1 which we are now on.
Beyond that, I'm still actively working on this and weighing the pros and cons of the next steps. It's no secret that ISC DHCPD is dead. I'm exploring Kea as well. Kea has a muh better extension model that makes integrations a bit cleaner (and can run in-proces instead of requiring a sidecar daemon like dhcpleases).
Unbound... Unbound was never designed be an authoritative resolver, but we use it as such for the local domain. A better model might be to run a separate internal DNS server that supports DDNS out of the box and just forward these lookups thus eliminating the need for a custom integration and service reloads etc.
It's an ongoing problem that I'm working on.
Updated by Michael Kolassa almost 2 years ago
Christian McDonald wrote in #note-49:
Unbound reloads are faster now when Python mode is enabled. I eliminated the expensive task of reloading the entire python interpreter every time Unbound is reloaded.
I am also exploring the new cache retention option introduced in 1.7.1 which we are now on.
Beyond that, I'm still actively working on this and weighing the pros and cons of the next steps. It's no secret that ISC DHCPD is dead. I'm exploring Kea as well. Kea has a muh better extension model that makes integrations a bit cleaner (and can run in-proces instead of requiring a sidecar daemon like dhcpleases).
Unbound... Unbound was never designed be an authoritative resolver, but we use it as such for the local domain. A better model might be to run a separate internal DNS server that supports DDNS out of the box and just forward these lookups thus eliminating the need for a custom integration and service reloads etc.
It's an ongoing problem that I'm working on.
Thank you for the update and all of your hard work so far! This ticket has been a thorn in an otherwise phenomenal product, and it's great to see someone really deep diving into it after all these years.
Updated by Allistah F over 1 year ago
Hi there, I just wanted to say thanks for all the time and work that is going into this fix. It's really a problem when there are so many devices on the network and unbound is being constantly restarted - it causes a lot of delays that really shouldn't happen. Does anyone have any idea when this would be fixed?
Updated by Mark Abram over 1 year ago
I had high hopes that we may see the fix in the latest version (23.05). Do we have a road map or at least a time frame when we can expect this to be resolved. As it is now, any time a DHCP client connects or reconnects to the network DNS Resolver is killed and restarted.
Updated by Dennis Adler about 1 year ago
Hello Christian, any updates on your progress? Thanks!
Updated by Dennis Adler 10 months ago
A question for you, Christian. Does the DHCP change to KEA's code mean this is no longer a problem? Or are the notification methods to Unbound still going to cause crashes? If you think it has fixed this, I am willing to put the time in to test it out. Let me know!
Updated by Jim Pingle 4 months ago
- Related to Feature #15651: Kea DNS Resolver (Unbound) Integration (IPv4 and IPv6) added
Updated by Christian McDonald 3 months ago
- Status changed from Confirmed to Feedback
- Plus Target Version set to 24.08
We now have a brand new integration with Kea that solves all of these issues and more. We now support both DHCPv4 and v6 DNS registration with unbound. Additionally, it is now possible to turn on registration globally for all subnets or override the registration policy per subnet. This allows for both per-subnet opt-in (aka global disable policy) or opt-out (aka global enable policy). It's very flexible.
The domain name that is appended to each lease hostname for registration is selected according to the following rules(first match wins).
1. Does the DHCP response contain a domain-name option? If so, use it.
2. If no domain-name option is present in the response, or we are handling a DHCPv6 lease, check it the response contains a search-domain option. If so, use the first one as multiple could be specified.
3. Use the system domain name.
Updated by Jim Pingle about 2 months ago
- Subject changed from Incorrect Handling of Unbound Resolver [service restarts, cache loss, DNS service interruption] to Reduce disruptions when changing DNS records from DHCP leases in Unbound
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
This has been working well in snapshots. Records are updated on-the-go, no restarts.
Updated by Michael Damsgaard about 2 months ago
uh uh uh !!
MUST HAVE THIS FIX !
Please please please provide patch, or a URL for the snapshot mentioned. I will test, and I would pay good money to be doing so yesterday 😁
Updated by Jim Pingle about 1 month ago
- Plus Target Version changed from 24.08 to 24.11