Project

General

Profile

Bug #11449

BIND fails during/after upgrade to 21.02/2.50

Added by Anthony Pants about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Category:
BIND
Target version:
-
Start date:
02/17/2021
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.5.0
Affected Architecture:

Description

After upgrading to 21.02, the named service wouldn’t start and the logs said it was segfaulting ("signal 11"). So I rebooted, which took a while to kick off, and then named was just gone from the Services list. It was still in my list of installed packages, so I figured I’d try reinstalling it, but after the `Executing custom_php_resync_config_command()` step, I got a bunch of `rndc: connect failed: 127.0.0.1#953: timed out` messages on the Package Reinstallation screen. I opened a different tab and turned on unbound, and either doing that or waiting long enough got the reinstallation process to complete.

Every time I try to start named from the GUI, it throws a signal 11. I can try to start named from SSH, but
`named`works, but has trouble quitting
`named -u bind` works
`named -u bind -t /cf/named` works
`named -u bind -t /cf/named -c /etc/namedb/named.conf` signal 11

`named-checkconf -t /cf/named /etc/namedb/named.conf` does not return any errors.

Permissions inside /cf/named seemed inconsistent; lots of files and folders owned by `root:wheel`, but `chown -R bind:wheel /cf/named` didn't resolve the signal 11 issue.

Changes in the GUI don't seem to be making it to `/cf/named/etc/namedb/named.conf`.

I've tried additional reboots and reinstallations, leaving

I'm on a SG-4860 running pfSense+, but someone else on the forums (https://forum.netgate.com/topic/160963/bind-upgrade-producing-errors-on-pfsense-2-5-upgrade/) says they're on 2.50 and are running into this error, or something similar. Didn't see their bug report here.

History

#1 Updated by Tchello Mello about 2 months ago

I'm also hitting the same problem on my SG-3100.

Seeing the same permissions problems here is what I'm seeing:

[21.02-RELEASE][admin@gw]/usr/local/sbin: /usr/local/sbin/named-checkconf -t /cf/named /etc/namedb/named.conf
[21.02-RELEASE][admin@gw]/usr/local/sbin: echo $?
0

[21.02-RELEASE][admin@gw]/usr/local/sbin: /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/ 
[21.02-RELEASE][admin@gw]/usr/local/sbin: echo $?
1

From the logs, we can see:

Feb 18 23:15:54 gw named[21708]: starting BIND 9.16.11 (Stable Release) <id:9ff601b>
Feb 18 23:15:54 gw named[21708]: running on FreeBSD arm 12.2-STABLE FreeBSD 12.2-STABLE 38a4c12973d(plus-devel-12) pfSense-SG-3100
Feb 18 23:15:54 gw named[21708]: built with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--enable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--without-libidn2' '--with-json-c' '--disable-largefile' '--without-lmdb' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' '--enable-tcp-fastopen' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=armv7-portbld-freebsd12.2' 'build_alias=armv7-portbld-freebsd12.2' 'CC=/nxb-bin/usr/bin/cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-isystem /usr/local/include' 'CPP=/nxb-bin/usr/bin/cpp' 'PKG_CONFIG=pkgconf'
Feb 18 23:15:54 gw named[21708]: running as: named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
Feb 18 23:15:54 gw named[21708]: compiled by CLANG FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
Feb 18 23:15:54 gw named[21708]: compiled with OpenSSL version: OpenSSL 1.1.1i-freebsd  8 Dec 2020
Feb 18 23:15:54 gw named[21708]: linked to OpenSSL version: OpenSSL 1.1.1i-freebsd  8 Dec 2020
Feb 18 23:15:54 gw named[21708]: compiled with libxml2 version: 2.9.10
Feb 18 23:15:54 gw named[21708]: linked to libxml2 version: 20910
Feb 18 23:15:54 gw named[21708]: compiled with json-c version: 0.15
Feb 18 23:15:54 gw named[21708]: linked to json-c version: 0.15
Feb 18 23:15:54 gw named[21708]: compiled with zlib version: 1.2.11
Feb 18 23:15:54 gw named[21708]: linked to zlib version: 1.2.11
Feb 18 23:15:54 gw named[21708]: ----------------------------------------------------
Feb 18 23:15:54 gw named[21708]: BIND 9 is maintained by Internet Systems Consortium,
Feb 18 23:15:54 gw named[21708]: Inc. (ISC), a non-profit 501(c)(3) public-benefit 
Feb 18 23:15:54 gw named[21708]: corporation.  Support and training for BIND 9 are 
Feb 18 23:15:54 gw named[21708]: available at https://www.isc.org/support
Feb 18 23:15:54 gw named[21708]: ----------------------------------------------------
Feb 18 23:15:54 gw named[21708]: found 2 CPUs, using 2 worker threads
Feb 18 23:15:54 gw named[21708]: using 2 UDP listeners per interface
Feb 18 23:15:54 gw named[21708]: using up to 21000 sockets
Feb 18 23:15:54 gw named[21708]: loading configuration from '/etc/namedb/named.conf'
Feb 18 23:15:54 gw named[21708]: unable to open '/usr/local/etc/namedb/bind.keys'; using built-in keys instead
Feb 18 23:15:54 gw named[21708]: using default UDP/IPv4 port range: [49152, 65535]
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lo0, 127.0.0.1#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.222, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.33, 192.168.XXX.62#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface lagg0.69, 192.168.XXX.254#53
Feb 18 23:15:54 gw named[21708]: listening on IPv4 interface ovpns2, 172.16.XXX.1#53
Feb 18 23:15:54 gw named[21708]: generating session key for dynamic DNS
Feb 18 23:15:54 gw named[21708]: sizing zone task pool based on 31 zones

Then it produces:

[21.02-RELEASE][admin@gw]/usr/local/sbin: dmesg
pid 21708 (named), jid 0, uid 0: exited on signal 11

#2 Updated by Viktor Gurov about 2 months ago

can be related to #7271

#3 Updated by Wayne Graves about 2 months ago

unbound not running when this occurred on my pfsense 2.5.

#4 Updated by Chris R about 2 months ago

Wayne Graves wrote:

unbound not running when this occurred on my pfsense 2.5.

Yea, ignore my comment (I deleted it now), I thought this issue was something completely different, sorry for any confusion.

#5 Updated by Tchello Mello about 2 months ago

-I'm going to check how can I install strace on this box to see if I can further debug it.

Used `truss` however, not much information :(

[21.02-RELEASE][admin@gw]: truss -faed -s 25555 -o truss.out /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/

[21.02-RELEASE][admin@gw]: dmesg
pid 23517 (named), jid 0, uid 0: exited on signal 11

[21.02-RELEASE][admin@gw]: less truss.out
...SNIP...
23517: 0.300333411 clock_gettime(10,{ 1613749329.601388714 }) = 0 (0x0)
23517: 0.300440708 clock_gettime(10,{ 1613749329.601388714 }) = 0 (0x0)
23517: 0.300542128 open(".",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,013000056) = 51 (0x33)
23517: 0.300613175 fstatfs(51,{ fstypename=ufs,mntonname=/,mntfromname=/dev/ufsid/5cdd38aef2899872,fsid=ae38dd5c729889f2 }) = 0 (0x0)
23517: 0.300707385 getdirentries(51,"\M-f=\^B\0\0\0\0\0\f\0\0\0\0\0\0\0 \0\^D\0\^A\0\0\0.\0\0\0\0\0\0\0\M-e=\^B\0\0\0\0\0\^X\0\0\0\0\0\0\0 \0\^D\0\^B\0\0\0..\0\0\0\0\0\0\M-g=\^B\0\0\0\0\0(\0\0\0\0\0\0\0 \0\^D\0\^D\0\0\0keys\0\0\0\0\M-o=\^B\0\0\0\0\0<\0\0\0\0\0\0\0(\0\b\0\n\0\0\0named.conf\0\0\0\0\0\0\M-p=\^B\0\0\0\0\0P\0\0\0\0\0\0\0(\0\b\0\t\0\0\0rndc.conf\0\0\0\0\0\0\0\M-q=\^B\0\0\0\0\0d\0\0\0\0\0\0\0(\0\b\0\n\0\0\0named.root\0\0\0\0\0\0:\M-5\^B\0\0\0\0\0t\0\0\0\0\0\0\0 \0\^D\0\^F\0\0\0master\0\0\M^T\M-5\^B\0\0\0\0\0\M^D\0\0\0\0\0\0\0 \0\^D\0\^E\0\0\0slave\0\0\0\M^Z<\^B\0\0\0\0\0\M^\\0\0\0\0\0\0\0 \0\b\0\a\0\0\0key.txt\0\M-8<\^B\0\0\0\0\0\M-@\0\0\0\0\0\0\08\0\b\0\^Z\0\0\0002f1a0f2385e3ddce.mkeys.jnl\0\0\0\0\0\0\M-.<\^B\0\0\0\0\0\M-d\0\0\0\0\0\0\08\0\b\0\^Z\0\0\0fc7dcd45fb133c39.mkeys.jnl\0\0\0\0\0\0\M-%<\^B\0\0\0\0\0\b\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0973d3f5b6083abd5.mkeys.jnl\0\0\0\0\0\0\240<\^B\0\0\0\0\0,\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0c8418e66bf12798a.mkeys.jnl\0\0\0\0\0\0\M^^<\^B\0\0\0\0\0L\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0c8418e66bf12798a.mkeys\0\0\M-3<\^B\0\0\0\0\0l\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0973d3f5b6083abd5.mkeys\0\0\M-J<\^B\0\0\0\0\0\M^L\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0002f1a0f2385e3ddce.mkeys\0\0\M^[<\^B\0\0\0\0\0\M-,\^A\0\0\0\0\0\0000\0\b\0\^V\0\0\0fc7dcd45fb133c39.mkeys\0\0\M^R<\^B\0\0\0\0\0\M-P\^A\0\0\0\0\0\08\0\b\0\^Z\0\0\0b25d320c2fae5cb4.mkeys.jnl\0\0\0\0\0\0\M^Y<\^B\0\0\0\0\0\0\^B\0\0\0\0\0\0000\0\b\0\^Q\0\0\0resync-tocalab.sh\0\0\0\0\0\0\0\M-"<\^B\0\0\0\0\0 \^B\0\0\0\0\0\0000\0\b\0\^V\0\0\0b25d320c2fae5cb4.mkeys\0\0\M^A<\^B\0\0\0\0\08\^B\0\0\0\0\0\0(\0\b\0\^N\0\0\0restart-dns.sh\0\0\M^S<\^B\0\0\0\0\0\0\^D\0\0\0\0\0\0(\0\b\0\^N\0\0\0tmp-23BoswIKwy\0\0",4096,{ 0x0 }) = 960 (0x3c0)
23517: 0.301106662 getdirentries(51,0x21ba5000,4096,{ 0x400 }) = 0 (0x0)
23517: 0.301192296 close(51)                     = 0 (0x0)
23517: 0.301307875 clock_gettime(10,{ 1613749329.602388690 }) = 0 (0x0)
23517: 0.301419416 mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 566620160 (0x21c5f000)
23517: 0.301525536 mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 566640640 (0x21c64000)
23517: 0.301639257 SIGNAL 11 (SIGSEGV) code=SEGV_MAPERR trapno=0 addr=0x0
23517: 0.301698233 sigwait({ SIGHUP|SIGINT|SIGTERM },0xbfbfeac8) ERESTART
23517: 0.301794489 <thread 100100 exited>
23517: 0.301831918 <thread 100415 exited>
23517: 0.301865509 <thread 100416 exited>
23517: 0.301938539 <thread 100419 exited>
23517: 0.301969366 <thread 100418 exited>
23517: 0.301996696 <thread 100420 exited>
23517: 0.302022572 <thread 100421 exited>
23183: 0.302092149 read(5,0xbfbfeafc,1)          = 0 (0x0)
23183: 0.304078697 exit(0x1)                    
23183: 0.304142025 process exit, rval = 1
23517: 0.304736373 process killed, signal = 11

#6 Updated by Viktor Gurov about 2 months ago

same issue with a clean BIND install?
pfSense Plus 21.02 or pfSense 2.5?
what kind of appliance? VM, Netgate appliance or something else?

#7 Updated by Tchello Mello about 2 months ago

I'll remove all the files tonight and then try it again with clean files.

It's running on a Netgate SG-3100
  • Welcome to Netgate pfSense Plus 21.02-RELEASE (arm) on gw ***

#8 Updated by Wayne Graves about 2 months ago

I'm using a Supermicro SuperServer E200-8D - Mini-1U - Xeon D-1528 1.9 GHz 32g ecc 500g ssd NMVe. A clean bind install does the same thing every time.

#9 Updated by Wayne Graves about 2 months ago

Thats running pfsense 2.5 upgraded from 2.4.5-p1

#10 Updated by Viktor Gurov about 2 months ago

  • Priority changed from Normal to High

#11 Updated by Stefan Andersson about 2 months ago

I also have this issue after upgrading to pfsense 2.5. I've noticed that if you reboot the named process doesn't seem to register a listening port for some reason. Service can't be killed from web interface either or restarted. But if i do "killall -9 named" and then restart service from web interface it seems to run properly after that until next reboot. I'll investigate more tomorrow =)

#13 Updated by Viktor Gurov about 1 month ago

manual installation of the latest BIND version fixes the issue:

# pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/lmdb-0.9.28,1.txz
# pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/bind916-9.16.12.txz

#14 Updated by Tchello Mello about 1 month ago

Hello team,

Any idea when this will be ported to armv7 arch (Netgate SG-3100)?

https://pkg.freebsd.org/FreeBSD:12:armv7/latest/

grep bind9 digests 
dns/bind9-devel:2$0$jk8t5q3sjof9ioq9dbxetbmat34o7pmkpqmbdp6r13317rb6peay:30020775:0:4541:7f99b3a86c3eb0749e02c6b0971c7e2e298a67fe287a2ca2d3d3c89bc739b13b
dns/bind911:2$0$zb63kdpotnteofpg1ikeiure1obro5ih11m8zg4ma6s1a757g6wb:29982738:0:4186:bd9c9f7526557ce88c68ba053e75c947ee89b65d5c80756eb8aebd7c12a85f49
dns/bind912:2$0$epkkxjoqs7fpreb8qacner58rynunj71znucu9a7nofjncq518hy:29942833:0:4167:5820e4e2c143f0b85b6470df7835750feaebf4b4ba77694d9c19b9139eba3df4
dns/bind913:2$0$bjfykrbwusaqmqmomjb9es7uf7eawq5qu7yjjjx69qxeoihpcnay:29889413:0:4462:df824d894af961289627c593578b4c0f78df93923051828bfb7548163dbe4786

Thanks

#15 Updated by Andreas Grommek about 1 month ago

Hello everybody,

I became aware of this bug report after finding this forum thread via googling: https://forum.netgate.com/topic/160963/bind-upgrade-producing-errors-on-pfsense-2-5-upgrade/35

BIND was not starting up after the upgrade von pfSense 2.4.5 to 2.5.0 (community edition) and the menu entriy for GUI configuration was gone. Tried to re-install the package through the GUI, but the process would always hang (for minutes) with repeated messages like so:

Executing custom_php_resync_config_command()...rndc: connect failed: 127.0.0.1#53953: timed out

Somebody in the forum thread suggested to start named manually with

/usr/local/sbin/named -c /etc/namedb/named.conf -u bind -t /cf/named/

and this worked: The installer finished with success, the GUI menu entry for BIND was back. However, BIND will not start on boot (although enabled in the GUI), nor will it (re-)start when I disable and subsequently enable it in the GUI.

I have no ACLs configured and the latest BIND version installed (I think, the one with the bugfix concerning ACLs):

[2.5.0-RELEASE][admin@gateway.example.com]/root: /usr/local/sbin/named -V
BIND 9.16.12 (Stable Release) <id:aeb943d>
running on FreeBSD amd64 12.2-STABLE FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense
built by make with '--disable-linux-caps' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/namedb' '--with-dlopen=yes' '--with-libxml2' '--with-openssl=/usr' '--with-readline=-L/usr/local/lib -ledit' '--with-dlz-filesystem=yes' '--enable-dnstap' '--disable-fixed-rrset' '--disable-geoip' '--without-maxminddb' '--without-gssapi' '--without-libidn2' '--with-json-c' '--disable-largefile' '--without-lmdb' '--disable-native-pkcs11' '--without-python' '--disable-querytrace' '--enable-tcp-fastopen' '--disable-symtable' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd12.2' 'build_alias=amd64-portbld-freebsd12.2' 'CC=cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -ljson-c -fstack-protector-strong ' 'LIBS=-L/usr/local/lib' 'CPPFLAGS=-isystem /usr/local/include' 'CPP=cpp' 'PKG_CONFIG=pkgconf'
compiled by CLANG FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
compiled with OpenSSL version: OpenSSL 1.1.1i-freebsd  8 Dec 2020
linked to OpenSSL version: OpenSSL 1.1.1i-freebsd  8 Dec 2020
compiled with libuv version: 1.40.0
linked to libuv version: 1.40.0
compiled with libxml2 version: 2.9.10
linked to libxml2 version: 20910
compiled with json-c version: 0.15
linked to json-c version: 0.15
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
compiled with protobuf-c version: 1.3.2
linked to protobuf-c version: 1.3.2
threads support is enabled

default paths:
  named configuration:  /usr/local/etc/namedb/named.conf
  rndc configuration:   /usr/local/etc/namedb/rndc.conf
  DNSSEC root key:      /usr/local/etc/namedb/bind.keys
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/pid
  named lock file:      /var/run/named/named.lock

So I guess, there is some pfSense specific bug somewhere. For now I can live with the workaround (starting BIND manually whenever I reboot pfSense), but I would rather have it working correctly. :-)

Please tell me if/how I can help to debug this. This is on an productive system, unfortunately I have no possibility to try this out on a fresh install and add my zone files and other configs one-by-one until something breaks.

Andi

#16 Updated by Stefan Andersson about 1 month ago

Is this bug fixed with the new version of the bind package release for pfsense ?

#17 Updated by Tchello Mello about 1 month ago

It did fix the issue for me.

#18 Updated by Renato Botelho about 1 month ago

  • Status changed from New to Resolved
  • Assignee set to Renato Botelho

Also available in: Atom PDF