Project

General

Profile

Bug #4396

Lengthy unbound outage during restart when adding static DHCP leases

Added by Elliott Quarles about 5 years ago. Updated over 4 years ago.

Status:
Duplicate
Priority:
High
Category:
DNS Resolver
Target version:
-
Start date:
02/09/2015
Due date:
% Done:

0%

Estimated time:
Affected Version:
Affected Architecture:

Description

When updating static DHCP leases the call to services_unbound_configure on the services_dhcp page causes a full rebuild of the unbound configure and a restart. This causes host resolution to fail while unbound is offline which in my environment takes 20+ seconds. DNS shouldn't be down for that long - especially when adding a static entry. Even with CARP setup both unbound services go down at near the same time.

This was corrected in my environment by changing the kill - config - start procedure of the services_unbound_configure() in services.inc to getting rid of the kill and calling unbound_control("reload") instead of sync_unbound_service().

Both adds and deletes work fine from services_dhcp and the reload now happens in less then 1 second with no perceivable loss of service.

Perhaps a modification of the unbound restart/reload functions would allow us to avoid the kill and rebuild kitchen sync call of unbound to be where it's not necessary.

History

#1 Updated by Elliott Quarles about 5 years ago

Update:
Affected version: 2.2 Release

#2 Updated by Chris Buechler about 5 years ago

  • Category set to DNS Resolver
  • Assignee set to Chris Buechler
  • Priority changed from Normal to High
  • Affected Version set to 2.2

#3 Updated by Chris Buechler over 4 years ago

  • Status changed from New to Confirmed
  • Target version set to 2.3

The root problem is that unbound reload functions (-HUP, unbound-control reload) actually stop, then start unbound. Which is broken.. That can take some time with certain configs, though it's a fraction of a second for what the majority use. dhcpleases updates trigger a -HUP, so it triggers brief DNS resolution outages while it's doing a 'reload' (which is really a stop/start).

That's something we'll need to pursue upstream with unbound.

#4 Updated by Chris Buechler over 4 years ago

  • Status changed from Confirmed to Duplicate
  • Target version deleted (2.3)
  • Affected Version deleted (2.2)

closing this in favor of #5413 which has better explanation of root cause.

Also available in: Atom PDF