Bug #11449
closedBIND fails during/after upgrade to 21.02/2.50
0%
Description
After upgrading to 21.02, the named service wouldn’t start and the logs said it was segfaulting ("signal 11"). So I rebooted, which took a while to kick off, and then named was just gone from the Services list. It was still in my list of installed packages, so I figured I’d try reinstalling it, but after the `Executing custom_php_resync_config_command()` step, I got a bunch of `rndc: connect failed: 127.0.0.1#953: timed out` messages on the Package Reinstallation screen. I opened a different tab and turned on unbound, and either doing that or waiting long enough got the reinstallation process to complete.
Every time I try to start named from the GUI, it throws a signal 11. I can try to start named from SSH, but
`named`works, but has trouble quitting
`named -u bind` works
`named -u bind -t /cf/named` works
`named -u bind -t /cf/named -c /etc/namedb/named.conf` signal 11
`named-checkconf -t /cf/named /etc/namedb/named.conf` does not return any errors.
Permissions inside /cf/named seemed inconsistent; lots of files and folders owned by `root:wheel`, but `chown -R bind:wheel /cf/named` didn't resolve the signal 11 issue.
Changes in the GUI don't seem to be making it to `/cf/named/etc/namedb/named.conf`.
I've tried additional reboots and reinstallations, leaving
I'm on a SG-4860 running pfSense+, but someone else on the forums (https://forum.netgate.com/topic/160963/bind-upgrade-producing-errors-on-pfsense-2-5-upgrade/) says they're on 2.50 and are running into this error, or something similar. Didn't see their bug report here.
Updated by Tchello Mello almost 4 years ago
I'm also hitting the same problem on my SG-3100.
Seeing the same permissions problems here is what I'm seeing:
[21.02-RELEASE][admin@gw]/usr/local/sbin: /usr/local/sbin/named-checkconf -t /cf/named /etc/namedb/named.conf
[21.02-RELEASE][admin@gw]/usr/local/sbin: echo $?
0
[21.02-RELEASE][admin@gw]/usr/local/sbin: /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
[21.02-RELEASE][admin@gw]/usr/local/sbin: echo $?
1
From the logs, we can see:
Feb 18 23:15:54 gw named[21708]: starting BIND 9.16.11 (Stable Release) <id:9ff601b>
Feb 18 23:15:54 gw named[21708]: running on FreeBSD arm 12.2-STABLE FreeBSD 12.2-STABLE 38a4c12973d(plus-devel-12) pfSense-SG-3100
Feb 18 23:15:54 gw named[21708]: built with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--enable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--without-libidn2' '--with-json-c' '--disable-largefile' '--without-lmdb' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' '--enable-tcp-fastopen' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=armv7-portbld-freebsd12.2' 'build_alias=armv7-portbld-freebsd12.2' 'CC=/nxb-bin/usr/bin/cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-isystem /usr/local/include' 'CPP=/nxb-bin/usr/bin/cpp' 'PKG_CONFIG=pkgconf'
Feb 18 23:15:54 gw named[21708]: running as: named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
Feb 18 23:15:54 gw named[21708]: compiled by CLANG FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
Feb 18 23:15:54 gw named[21708]: compiled with OpenSSL version: OpenSSL 1.1.1i-freebsd 8 Dec 2020
Feb 18 23:15:54 gw named[21708]: linked to OpenSSL version: OpenSSL 1.1.1i-freebsd 8 Dec 2020
Feb 18 23:15:54 gw named[21708]: compiled with libxml2 version: 2.9.10
Feb 18 23:15:54 gw named[21708]: linked to libxml2 version: 20910
Feb 18 23:15:54 gw named[21708]: compiled with json-c version: 0.15
Feb 18 23:15:54 gw named[21708]: linked to json-c version: 0.15
Feb 18 23:15:54 gw named[21708]: compiled with zlib version: 1.2.11
Feb 18 23:15:54 gw named[21708]: linked to zlib version: 1.2.11
Feb 18 23:15:54 gw named[21708]: ----------------------------------------------------
Feb 18 23:15:54 gw named[21708]: BIND 9 is maintained by Internet Systems Consortium,
Feb 18 23:15:54 gw named[21708]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Feb 18 23:15:54 gw named[21708]: corporation. Support and training for BIND 9 are
Feb 18 23:15:54 gw named[21708]: available at https://www.isc.org/support
Feb 18 23:15:54 gw named[21708]: ----------------------------------------------------
Feb 18 23:15:54 gw named[21708]: found 2 CPUs, using 2 worker threads
Feb 18 23:15:54 gw named[21708]: using 2 UDP listeners per interface
Feb 18 23:15:54 gw named[21708]: using up to 21000 sockets
Feb 18 23:15:54 gw named[21708]: loading configuration from '/etc/namedb/named.conf'
Feb 18 23:15:54 gw named[21708]: unable to open '/usr/local/etc/namedb/bind.keys'; using built-in keys instead
Feb 18 23:15:54 gw named[21708]: using default UDP/IPv4 port range: [49152, 65535]
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lo0, 127.0.0.1#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.222, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.33, 192.168.XXX.62#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.69, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface ovpns2, 172.16.XXX.1#53
Feb 18 23:15:54 gw named[21708]: generating session key for dynamic DNS
Feb 18 23:15:54 gw named[21708]: sizing zone task pool based on 31 zones
Then it produces:
[21.02-RELEASE][admin@gw]/usr/local/sbin: dmesg
pid 21708 (named), jid 0, uid 0: exited on signal 11
Updated by Wayne Graves almost 4 years ago
unbound not running when this occurred on my pfsense 2.5.
Updated by Chris R almost 4 years ago
Wayne Graves wrote:
unbound not running when this occurred on my pfsense 2.5.
Yea, ignore my comment (I deleted it now), I thought this issue was something completely different, sorry for any confusion.
Updated by Tchello Mello almost 4 years ago
-I'm going to check how can I install strace on this box to see if I can further debug it.
Used `truss` however, not much information :(
[21.02-RELEASE][admin@gw]: truss -faed -s 25555 -o truss.out /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
[21.02-RELEASE][admin@gw]: dmesg
pid 23517 (named), jid 0, uid 0: exited on signal 11
[21.02-RELEASE][admin@gw]: less truss.out
...SNIP...
23517: 0.300333411 clock_gettime(10,{ 1613749329.601388714 }) = 0 (0x0)
23517: 0.300440708 clock_gettime(10,{ 1613749329.601388714 }) = 0 (0x0)
23517: 0.300542128 open(".",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,013000056) = 51 (0x33)
23517: 0.300613175 fstatfs(51,{ fstypename=ufs,mntonname=/,mntfromname=/dev/ufsid/5cdd38aef2899872,fsid=ae38dd5c729889f2 }) = 0 (0x0)
23517: 0.300707385 getdirentries(51,"\M-f=\^B\0\0\0\0\0\f\0\0\0\0\0\0\0 \0\^D\0\^A\0\0\0.\0\0\0\0\0\0\0\M-e=\^B\0\0\0\0\0\^X\0\0\0\0\0\0\0 \0\^D\0\^B\0\0\0..\0\0\0\0\0\0\M-g=\^B\0\0\0\0\0(\0\0\0\0\0\0\0 \0\^D\0\^D\0\0\0keys\0\0\0\0\M-o=\^B\0\0\0\0\0<\0\0\0\0\0\0\0(\0\b\0\n\0\0\0named.conf\0\0\0\0\0\0\M-p=\^B\0\0\0\0\0P\0\0\0\0\0\0\0(\0\b\0\t\0\0\0rndc.conf\0\0\0\0\0\0\0\M-q=\^B\0\0\0\0\0d\0\0\0\0\0\0\0(\0\b\0\n\0\0\0named.root\0\0\0\0\0\0:\M-5\^B\0\0\0\0\0t\0\0\0\0\0\0\0 \0\^D\0\^F\0\0\0master\0\0\M^T\M-5\^B\0\0\0\0\0\M^D\0\0\0\0\0\0\0 \0\^D\0\^E\0\0\0slave\0\0\0\M^Z<\^B\0\0\0\0\0\M^\\0\0\0\0\0\0\0 \0\b\0\a\0\0\0key.txt\0\M-8<\^B\0\0\0\0\0\M-@\0\0\0\0\0\0\08\0\b\0\^Z\0\0\0002f1a0f2385e3ddce.mkeys.jnl\0\0\0\0\0\0\M-.<\^B\0\0\0\0\0\M-d\0\0\0\0\0\0\08\0\b\0\^Z\0\0\0fc7dcd45fb133c39.mkeys.jnl\0\0\0\0\0\0\M-%<\^B\0\0\0\0\0\b\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0973d3f5b6083abd5.mkeys.jnl\0\0\0\0\0\0\240<\^B\0\0\0\0\0,\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0c8418e66bf12798a.mkeys.jnl\0\0\0\0\0\0\M^^<\^B\0\0\0\0\0L\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0c8418e66bf12798a.mkeys\0\0\M-3<\^B\0\0\0\0\0l\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0973d3f5b6083abd5.mkeys\0\0\M-J<\^B\0\0\0\0\0\M^L\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0002f1a0f2385e3ddce.mkeys\0\0\M^[<\^B\0\0\0\0\0\M-,\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0fc7dcd45fb133c39.mkeys\0\0\M^R<\^B\0\0\0\0\0\M-P\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0b25d320c2fae5cb4.mkeys.jnl\0\0\0\0\0\0\M^Y<\^B\0\0\0\0\0\0\^B\0\0\0\0\0\0000\0\b\0\^Q\0\0\0resync-tocalab.sh\0\0\0\0\0\0\0\M-"<\^B\0\0\0\0\0 \^B\0\0\0\0\0\0000\0\b\0\^V\0\0\0b25d320c2fae5cb4.mkeys\0\0\M^A<\^B\0\0\0\0\08\^B\0\0\0\0\0\0(\0\b\0\^N\0\0\0restart-dns.sh\0\0\M^S<\^B\0\0\0\0\0\0\^D\0\0\0\0\0\0(\0\b\0\^N\0\0\0tmp-23BoswIKwy\0\0",4096,{ 0x0 }) = 960 (0x3c0)
23517: 0.301106662 getdirentries(51,0x21ba5000,4096,{ 0x400 }) = 0 (0x0)
23517: 0.301192296 close(51) = 0 (0x0)
23517: 0.301307875 clock_gettime(10,{ 1613749329.602388690 }) = 0 (0x0)
23517: 0.301419416 mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 566620160 (0x21c5f000)
23517: 0.301525536 mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 566640640 (0x21c64000)
23517: 0.301639257 SIGNAL 11 (SIGSEGV) code=SEGV_MAPERR trapno=0 addr=0x0
23517: 0.301698233 sigwait({ SIGHUP|SIGINT|SIGTERM },0xbfbfeac8) ERESTART
23517: 0.301794489 <thread 100100 exited>
23517: 0.301831918 <thread 100415 exited>
23517: 0.301865509 <thread 100416 exited>
23517: 0.301938539 <thread 100419 exited>
23517: 0.301969366 <thread 100418 exited>
23517: 0.301996696 <thread 100420 exited>
23517: 0.302022572 <thread 100421 exited>
23183: 0.302092149 read(5,0xbfbfeafc,1) = 0 (0x0)
23183: 0.304078697 exit(0x1)
23183: 0.304142025 process exit, rval = 1
23517: 0.304736373 process killed, signal = 11
Updated by Viktor Gurov almost 4 years ago
same issue with a clean BIND install?
pfSense Plus 21.02 or pfSense 2.5?
what kind of appliance? VM, Netgate appliance or something else?
Updated by Tchello Mello almost 4 years ago
I'll remove all the files tonight and then try it again with clean files.
It's running on a Netgate SG-3100- Welcome to Netgate pfSense Plus 21.02-RELEASE (arm) on gw ***
Updated by Wayne Graves almost 4 years ago
I'm using a Supermicro SuperServer E200-8D - Mini-1U - Xeon D-1528 1.9 GHz 32g ecc 500g ssd NMVe. A clean bind install does the same thing every time.
Updated by Wayne Graves almost 4 years ago
Thats running pfsense 2.5 upgraded from 2.4.5-p1
Updated by Viktor Gurov almost 4 years ago
- Priority changed from Normal to High
Updated by Stefan Andersson almost 4 years ago
I also have this issue after upgrading to pfsense 2.5. I've noticed that if you reboot the named process doesn't seem to register a listening port for some reason. Service can't be killed from web interface either or restarted. But if i do "killall -9 named" and then restart service from web interface it seems to run properly after that until next reboot. I'll investigate more tomorrow =)
Updated by Viktor Gurov almost 4 years ago
related to named ACL
see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=980786
Updated by Viktor Gurov almost 4 years ago
manual installation of the latest BIND version fixes the issue:
# pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/lmdb-0.9.28,1.txz # pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/bind916-9.16.12.txz
Updated by Tchello Mello almost 4 years ago
Hello team,
Any idea when this will be ported to armv7 arch (Netgate SG-3100)?
https://pkg.freebsd.org/FreeBSD:12:armv7/latest/
grep bind9 digests dns/bind9-devel:2$0$jk8t5q3sjof9ioq9dbxetbmat34o7pmkpqmbdp6r13317rb6peay:30020775:0:4541:7f99b3a86c3eb0749e02c6b0971c7e2e298a67fe287a2ca2d3d3c89bc739b13b dns/bind911:2$0$zb63kdpotnteofpg1ikeiure1obro5ih11m8zg4ma6s1a757g6wb:29982738:0:4186:bd9c9f7526557ce88c68ba053e75c947ee89b65d5c80756eb8aebd7c12a85f49 dns/bind912:2$0$epkkxjoqs7fpreb8qacner58rynunj71znucu9a7nofjncq518hy:29942833:0:4167:5820e4e2c143f0b85b6470df7835750feaebf4b4ba77694d9c19b9139eba3df4 dns/bind913:2$0$bjfykrbwusaqmqmomjb9es7uf7eawq5qu7yjjjx69qxeoihpcnay:29889413:0:4462:df824d894af961289627c593578b4c0f78df93923051828bfb7548163dbe4786
Thanks
Updated by Andreas Grommek almost 4 years ago
Hello everybody,
I became aware of this bug report after finding this forum thread via googling: https://forum.netgate.com/topic/160963/bind-upgrade-producing-errors-on-pfsense-2-5-upgrade/35
BIND was not starting up after the upgrade von pfSense 2.4.5 to 2.5.0 (community edition) and the menu entriy for GUI configuration was gone. Tried to re-install the package through the GUI, but the process would always hang (for minutes) with repeated messages like so:
Executing custom_php_resync_config_command()...rndc: connect failed: 127.0.0.1#53953: timed out
Somebody in the forum thread suggested to start named manually with
/usr/local/sbin/named -c /etc/namedb/named.conf -u bind -t /cf/named/
and this worked: The installer finished with success, the GUI menu entry for BIND was back. However, BIND will not start on boot (although enabled in the GUI), nor will it (re-)start when I disable and subsequently enable it in the GUI.
I have no ACLs configured and the latest BIND version installed (I think, the one with the bugfix concerning ACLs):
[2.5.0-RELEASE][admin@gateway.example.com]/root: /usr/local/sbin/named -V BIND 9.16.12 (Stable Release) <id:aeb943d> running on FreeBSD amd64 12.2-STABLE FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense built by make with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--enable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--without-libidn2' '--with-json-c' '--disable-largefile' '--without-lmdb' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' '--enable-tcp-fastopen' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd12.2' 'build_alias=amd64-portbld-freebsd12.2' 'CC=cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf' compiled by CLANG FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2) compiled with OpenSSL version: OpenSSL 1.1.1i-freebsd 8 Dec 2020 linked to OpenSSL version: OpenSSL 1.1.1i-freebsd 8 Dec 2020 compiled with libuv version: 1.40.0 linked to libuv version: 1.40.0 compiled with libxml2 version: 2.9.10 linked to libxml2 version: 20910 compiled with json-c version: 0.15 linked to json-c version: 0.15 compiled with zlib version: 1.2.11 linked to zlib version: 1.2.11 compiled with protobuf-c version: 1.3.2 linked to protobuf-c version: 1.3.2 threads support is enabled default paths: named configuration: /usr/local/etc/namedb/named.conf rndc configuration: /usr/local/etc/namedb/rndc.conf DNSSEC root key: /usr/local/etc/namedb/bind.keys nsupdate session key: /var/run/named/session.key named PID file: /var/run/named/pid named lock file: /var/run/named/named.lock
So I guess, there is some pfSense specific bug somewhere. For now I can live with the workaround (starting BIND manually whenever I reboot pfSense), but I would rather have it working correctly. :-)
Please tell me if/how I can help to debug this. This is on an productive system, unfortunately I have no possibility to try this out on a fresh install and add my zone files and other configs one-by-one until something breaks.
Andi
Updated by Stefan Andersson almost 4 years ago
Is this bug fixed with the new version of the bind package release for pfsense ?
Updated by Renato Botelho almost 4 years ago
- Status changed from New to Resolved
- Assignee set to Renato Botelho