Project

General

Profile

Actions

Bug #6896

closed

unbound root.key file corruption possibly related to full file system

Added by George 77 over 7 years ago. Updated about 5 years ago.

Status:
Not a Bug
Priority:
Normal
Assignee:
Category:
DNS Resolver
Target version:
-
Start date:
11/04/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3.2
Affected Architecture:
amd64

Description

My root.key becomes corrupt and unbound crashes and no longer will start. This bug is likely related to #5334 and has been plaguing me for a couple months. I run pfSense in ESXi so thankfully I'm able to revert to an earlier state and the bug goes away for a while. It always seems to come back and has happened to me about 6 times overall in the past few months. I'm on the latest version 2.3.2p1. Recently, I got frustrated enough to rebuild pfSense from scratch, restored my config, and again this bug is back after a couple weeks. Looking through the logs it looks like unbound restarts on the hour occasionally, and it just fails to come back up when it goes dead. When it dies, I've seen root.key contain other files and more recently it was just 0 bytes. I've also tried regenerating the key from here: https://forum.pfsense.org/index.php?PHPSESSID=vnos0emsin6prv9n3g6n5iq9n0&topic=87357.msg524385#msg524385 and that appears to create the right file, but when I hit start on unbound in the pfSense portal or reboot I end up with a 0 byte file again. My VM is also not losing power and the uptime for this occurrence is 11 days.

This most recent time I noticed that the primary file system is full and is likely the culprit. I'm opening a ticket because I wonder if there is a better way to handle this, if it is indeed the case. My guess is snort or another package is filling up my file system, and then when a cron job reboots unbound, it tries to regenerate it's key but fails because there is no room on the file system.

Should pfSense alert when it runs out of space? Should it keep logs in a separate partition so that it doesn't fill up the root filesystem? Did I misconfigure it somewhere?

Here is an excerpt of the log that I pulled out.
Nov 4 14:00:05 unbound 51227:0 notice: Restart of unbound 1.5.9.
Nov 4 14:00:05 unbound 51227:0 info: start of service (unbound 1.5.9).
Nov 4 14:00:07 unbound 51227:1 error: could not fflush(/root.key): No space left on device
Nov 4 15:00:05 unbound 51227:0 notice: Restart of unbound 1.5.9.
Nov 4 15:00:05 unbound 51227:0 error: failed to read /root.key
Nov 4 15:00:05 unbound 51227:0 error: error reading auto-trust-anchor-file: /var/unbound/root.key
Nov 4 15:00:05 unbound 51227:0 error: validator: error in trustanchors config
Nov 4 15:00:05 unbound 51227:0 error: validator: could not apply configuration settings.
Nov 4 15:00:05 unbound 51227:0 error: module init for module validator failed
Nov 4 15:00:05 unbound 51227:0 fatal error: failed to setup modules

Actions #1

Updated by George 77 over 7 years ago

Just following up, I traced it down to the suricata package. My DNS log is gigabytes in length. What is strange is the log management settings are set to 750k in pfSense. I'm guessing the checkbox "Log Directory Size Limit" which is unchecked (default unchecked) means the below file limit settings aren't used.

Actions #2

Updated by Kill Bill over 7 years ago

The logs cannot fill up anything. They are circular and fixed size - see Status - System Logs - Settings. Simply make your virtual HDD bigger, it's clearly grossly inadequate for the purpose. (And no, none of the operating systems works properly, let alone well, when you are out of disk space. Doesn't matter whether its Windows, *BSD or Linux. They all will malfunction and eventually crash.)

Actions #3

Updated by Jim Thompson over 7 years ago

  • Status changed from New to Feedback
Actions #4

Updated by Jim Thompson over 7 years ago

  • Assignee set to George 77
Actions #5

Updated by Thaddeus Covert over 7 years ago

I just had the same issue. /var/ was at 100%. After trying to recreate the root.key, and noticing that dhcpd.conf couldn't save configs, I checked the filesystem sizes with df -h. After, I went to the logs settings page and "reset logs files". That took my /var to 45% and I was able to make everything work after that.

Services running.
avahi
dhcpd
dpinger
ntpd
openvpn*3
radvd
sshd
unbound

I don't believe I've made any changes to the log settings.

Actions #6

Updated by Anonymous over 5 years ago

Looks like the OP traced the issue, can the report be marked resolved now?

Actions #7

Updated by Jim Pingle about 5 years ago

  • Status changed from Feedback to Not a Bug
Actions

Also available in: Atom PDF