unbound exits on configuration error when link status flaps on LAN interface
I have pfSense installed at home on a small, old, core2duo-based machine. It does pretty typical home-router duty; the most obvious-to-me unusual parts of the configuration are that the internal IPv4 network is 18.104.22.168/24 instead of an RFC1918 network address, and I have an IPv6 tunnel to Hurricane Electric.
This week, the 11-year-old unmanaged GbE switch attached to the LAN port got flaky, and started to fail in some way that caused it to blink all lights on the front and stop passing traffic. Logs show link status flapping on the LAN interface. On power-cycling the switch, it would start working again. But DNS service was gone, though restartable at Status/Services/unbound. I found this in resolver.log:
Aug 13 20:28:22 router unbound: [27434:0] fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf
I wrote a little monitoring script that does 'pgrep unbound' and 'ifconfig em1' every 10 seconds. That seems to show link flapping between normal:
media: Ethernet autoselect (1000baseT <full-duplex>) status: active
and no link:
media: Ethernet autoselect status: no carrier
It also showed two copies of dhcpleases running after the link starts flapping.
Edited/excerpted logs and the monitoring script are attached, the switch starts flapping at Aug 13 20:27:57 in the logs, and I power-cycled the switch about 20:28:45. I restarted unbound at 20:30:36.
I tried reproducing the problem by manually plugging/unplugging the patch cable involved, and was not able to reproduce the problem. Alas, I destroyed the switch by plugging the wrong power supply in, so it's no longer helpful either. So I have no repro. I suspect connecting a FreeBSD box and running a little script that did things with 'ifconfig down' and 'ifconfig up' and 'ifconfig mediaopt <blah>' combined with some randomized short delays would eventually knock unbound over.
I haven't investigated the code at all, but it smells like some kind of race condition in the link-configuration scripts to me.
No data to display