Project

General

Profile

Bug #10342

Unbound domain overrides stop resolving periodically. They only resume after the service has been restarted.

Added by Charles Hamilton 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
DNS Resolver
Target version:
-
Start date:
03/13/2020
Due date:
% Done:

0%

Estimated time:
Affected Version:
Affected Architecture:
All

Description

Here is the original description of the problem: https://forum.netgate.com/topic/150898/unbound-domain-overrides-stop-resolving-periodically-they-only-resume-after-the-service-has-been-restarted) What follows is going to be mostly identical to my original post on the Netgate forums.

I've opened a ticket with Netgate and was directed to submit a bug request here.


All my deployments are mostly the same:

I've struggled with this for months now and none of the suggestions I've seen on the net have resolved the issue. I've been using dnsmasq on all of my deployments but decided to switch to unbound so that I could use DNSBL via pfBlockerNG. Dnsmasq has worked flawlessly, but I've had trouble with unbound with one feature that pretty much determines whether or not it can be used in my environment. I'm attaching a diagram, and sanitized versions of unbound.conf, domainoverrides.conf, and log excerpts.

Log excerpt # 1

This configuration works for a little while, but then it stops for some reason and never recovers until after the service has been restarted. I've collected some logs and it looks like at some point, after the forwarding breaks, queries that are supposed to be forwarded are instead resolved normally (via root servers):


Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: using localzone domain1.local. transparent
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: answer from the cache failed
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: udp request from ip4 10.90.90.111 port 59010 (len 16) 
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: mesh_run: start
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: validator operate: query _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: validator: pass to next module
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: mesh_run: validator module exit state is module_wait_module
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: process_request: new external request event
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: iter_handle processing q with state INIT REQUEST STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: resolving _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: request has dependency depth of 0
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: msg from cache lookup ;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 0 ;; flags: qr ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 0  ;; QUESTION SECTION: _ldap._tcp.pdc._msdcs.domain1.local.#011IN#011A  ;; ANSWER SECTION:  ;; AUTHORITY SECTION: .#0113435#011IN#011SOA#011a.root-servers.net. nstld.verisign-grs.com. 2020022700 1800 900 604800 86400 .#01186235#011IN#011RRSIG#011SOA 8 0 86400 20200311050000 20200227040000 33853 . OIjAHBoPtbQNEIlBT6wi25DJE5WetrGNX0litLaS21LI8Q/Z91uXPLiZMyhybW/16UPY24PnqGGpWTvdb5mli3QLiw2CkbWMmOgvAQ058IpteZ0VewAapQg7BFA+BRopbcpPBgD5ksl7/ZmB3xOUJHvWjUV4P5/ofoMyKrjh7Wi/tskRAdKzSgNUNwUvlhDNAT6s9coVgAHj+9GNvNa+JAoMh40Q0P+hS+Vn+98QElM9XdSQGMbY6s/v5V7vDBiYTPcILUj+Fmr0NSSy3sq3Go+EaeG6ye2DXnp/xn1CVu2VAndRG4mn9Zcw9/dCZ7xz/QN1CG4lfkRO89PdjN5TOA== ;{id = 33853} loans.#01186235#011IN#011NSEC#011locker. NS DS RRSIG NSEC loans.#01186235#011IN#011RRSIG#011NSEC 8 1 86400 20200311050000 20200227040000 33853 . TieEfn+f2pKQ/uc8OMZiP4eWYxZPZPPPOmbdMKxBZCZPGXCkboJ33390QgkCbS/iKb6L8Y1wLeFKAk4fcj5M85d+I45A
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: msg ttl is 3435, prefetch ttl 3075
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: returning answer from cache.
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: iter_handle processing q with state FINISHED RESPONSE STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: finishing processing for _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: mesh_run: iterator module exit state is module_finished
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: validator operate: query _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: validator: nextmodule returned
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: val handle processing q with state VAL_INIT_STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: validator classification nameerror
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: signer is . TYPE0 CLASS0
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: trust anchor NXDOMAIN by signed parent
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: val handle processing q with state VAL_FINDKEY_STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: validator: FindKey _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: val handle processing q with state VAL_VALIDATE_STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: verify rrset cached . SOA IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: verify rrset cached loans. NSEC IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: verify rrset cached . NSEC IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: Validating a nxdomain response
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: successfully validated NAME ERROR response.
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: validate(nxdomain): sec_status_secure
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: val handle processing q with state VAL_FINISHED_STATE
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: validation success _ldap._tcp.pdc._msdcs.domain1.local. SRV IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: negcache insert for zone . SOA IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: negcache rr loans. NSEC IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] info: negcache rr . NSEC IN
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: mesh_run: validator module exit state is module_finished
Feb 27 09:07:38 10.90.90.1 unbound: [40761:0] debug: query took 0.000000 sec

Log excerpt # 2

Here's a log snippet of a request attempting to resolve the print server. This request should have been forwarded according to our domain overrides, but instead it gets sent to the root DNS:

Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: udp request from ip4 10.90.90.111 port 64024 (len 16) 
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: mesh_run: start
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: validator operate: query D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: validator: pass to next module
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: mesh_run: validator module exit state is module_wait_module
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: process_request: new external request event
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: iter_handle processing q with state INIT REQUEST STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: resolving D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: request has dependency depth of 0
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: msg from cache lookup ;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 0 ;; flags: qr ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 0  ;; QUESTION SECTION: D1-PRINTSERVER.domain1.local.#011IN#011A  ;; ANSWER SECTION:  ;; AUTHORITY SECTION: .#0113311#011IN#011SOA#011a.root-servers.net. nstld.verisign-grs.com. 2020022700 1800 900 604800 86400 .#01186111#011IN#011RRSIG#011SOA 8 0 86400 20200311050000 20200227040000 33853 . OIjAHBoPtbQNEIlBT6wi25DJE5WetrGNX0litLaS21LI8Q/Z91uXPLiZMyhybW/16UPY24PnqGGpWTvdb5mli3QLiw2CkbWMmOgvAQ058IpteZ0VewAapQg7BFA+BRopbcpPBgD5ksl7/ZmB3xOUJHvWjUV4P5/ofoMyKrjh7Wi/tskRAdKzSgNUNwUvlhDNAT6s9coVgAHj+9GNvNa+JAoMh40Q0P+hS+Vn+98QElM9XdSQGMbY6s/v5V7vDBiYTPcILUj+Fmr0NSSy3sq3Go+EaeG6ye2DXnp/xn1CVu2VAndRG4mn9Zcw9/dCZ7xz/QN1CG4lfkRO89PdjN5TOA== ;{id = 33853} loans.#01186111#011IN#011NSEC#011locker. NS DS RRSIG NSEC loans.#01186111#011IN#011RRSIG#011NSEC 8 1 86400 20200311050000 20200227040000 33853 . TieEfn+f2pKQ/uc8OMZiP4eWYxZPZPPPOmbdMKxBZCZPGXCkboJ33390QgkCbS/iKb6L8Y1wLeFKAk4fcj5M85d+I45A4RwAX7
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: msg ttl is 3311, prefetch ttl 2951
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: returning answer from cache.
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: iter_handle processing q with state FINISHED RESPONSE STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: finishing processing for D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: mesh_run: iterator module exit state is module_finished
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: validator operate: query D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: validator: nextmodule returned
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_INIT_STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: validator classification nameerror
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: signer is . TYPE0 CLASS0
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: trust anchor NXDOMAIN by signed parent
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_FINDKEY_STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: validator: FindKey D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_VALIDATE_STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: verify rrset cached . SOA IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: verify rrset cached loans. NSEC IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: verify rrset cached . NSEC IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: Validating a nxdomain response
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: successfully validated NAME ERROR response.
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: validate(nxdomain): sec_status_secure
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_FINISHED_STATE
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: validation success D1-PRINTSERVER.domain1.local. A IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: negcache insert for zone . SOA IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: negcache rr loans. NSEC IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] info: negcache rr . NSEC IN
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: mesh_run: validator module exit state is module_finished
Feb 27 09:09:42 10.90.90.1 unbound: [40761:3] debug: query took 0.000000 sec

Log excerpt # 3

I also noticed (on subsequent attempts to resolve the print server) that the domain portion was being duplicated -- I suspect -- this is related to the domain search suffix; after all, if the server couldn't be resolved then I can envision a scenario where the suffix would be appended and resolution reattempted... but this is pure speculation:

Feb 27 09:41:14 10.90.90.1 unbound: [40761:3] debug: using localzone domain1.local. transparent
Feb 27 09:41:19 10.90.90.1 unbound: message repeated 2 times: [ [40761:3] debug: using localzone domain1.local. transparent]
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: udp request from ip4 127.0.0.1 port 30607 (len 16) 
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: mesh_run: start
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: validator operate: query D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: validator: pass to next module
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: mesh_run: validator module exit state is module_wait_module
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: process_request: new external request event
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: iter_handle processing q with state INIT REQUEST STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: resolving D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: request has dependency depth of 0
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: msg from cache lookup ;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 0 ;; flags: qr ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 0  ;; QUESTION SECTION: D1-PRINTSERVER.domain1.local.domain1.local.#011IN#011A  ;; ANSWER SECTION:  ;; AUTHORITY SECTION: .#0111414#011IN#011SOA#011a.root-servers.net. nstld.verisign-grs.com. 2020022700 1800 900 604800 86400 .#01184214#011IN#011RRSIG#011SOA 8 0 86400 20200311050000 20200227040000 33853 . OIjAHBoPtbQNEIlBT6wi25DJE5WetrGNX0litLaS21LI8Q/Z91uXPLiZMyhybW/16UPY24PnqGGpWTvdb5mli3QLiw2CkbWMmOgvAQ058IpteZ0VewAapQg7BFA+BRopbcpPBgD5ksl7/ZmB3xOUJHvWjUV4P5/ofoMyKrjh7Wi/tskRAdKzSgNUNwUvlhDNAT6s9coVgAHj+9GNvNa+JAoMh40Q0P+hS+Vn+98QElM9XdSQGMbY6s/v5V7vDBiYTPcILUj+Fmr0NSSy3sq3Go+EaeG6ye2DXnp/xn1CVu2VAndRG4mn9Zcw9/dCZ7xz/QN1CG4lfkRO89PdjN5TOA== ;{id = 33853} loans.#01184214#011IN#011NSEC#011locker. NS DS RRSIG NSEC loans.#01184214#011IN#011RRSIG#011NSEC 8 1 86400 20200311050000 20200227040000 33853 . TieEfn+f2pKQ/uc8OMZiP4eWYxZPZPPPOmbdMKxBZCZPGXCkboJ33390QgkCbS/iKb6L8Y1wLeFKAk4fcj5M85d+
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: msg ttl is 1414, prefetch ttl 1054
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: returning answer from cache.
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: iter_handle processing q with state FINISHED RESPONSE STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: finishing processing for D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: mesh_run: iterator module exit state is module_finished
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: validator operate: query D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: validator: nextmodule returned
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_INIT_STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: validator classification nameerror
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: signer is . TYPE0 CLASS0
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: trust anchor NXDOMAIN by signed parent
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_FINDKEY_STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: validator: FindKey D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_VALIDATE_STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: verify rrset cached . SOA IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: verify rrset cached loans. NSEC IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: verify rrset cached . NSEC IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: Validating a nxdomain response
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: successfully validated NAME ERROR response.
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: validate(nxdomain): sec_status_secure
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: val handle processing q with state VAL_FINISHED_STATE
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: validation success D1-PRINTSERVER.domain1.local.domain1.local. A IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: negcache insert for zone . SOA IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: negcache rr loans. NSEC IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] info: negcache rr . NSEC IN
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: mesh_run: validator module exit state is module_finished
Feb 27 09:41:19 10.90.90.1 unbound: [40761:3] debug: query took 0.000000 sec
domainoverrides.conf (224 Bytes) domainoverrides.conf sanitized domainoverrides.conf Charles Hamilton, 03/13/2020 09:54 AM
unbound.conf (2.15 KB) unbound.conf sanitized unbound.conf Charles Hamilton, 03/13/2020 09:54 AM
pfSense_unbound_deployment_diagram.png (40.8 KB) pfSense_unbound_deployment_diagram.png deployment diagram (png) Charles Hamilton, 03/13/2020 10:05 AM

History

#1 Updated by Jim Pingle 8 months ago

  • Category set to DNS Resolver

Also available in: Atom PDF