Project

General

Profile

Bug #8449

FRR 4.0 zebra daemon crashes

Added by Jim Pingle about 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
FRR
Target version:
Start date:
04/09/2018
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4
Affected Architecture:
All

Description

The zebra daemon in FRR 4.0 won't stay running with a BGP configuration. It crashes on startup. OSPF alone seems to be OK.

Crash backtrace from SG-3100 that hits it:

: cat /var/tmp/quagga.zebra.crashlog
2017/12/21 14:13:18 ZEBRA: Assertion `node->lock > 0' failed in file table.c, line 201, function route_unlock_node
2017/12/21 14:13:18 ZEBRA: Cannot get backtrace, returned invalid # of frames -1 (valid range is between 1 and 20)
2017/12/21 14:13:18 ZEBRA: Current thread not known/applicable

Crash backtrace from an amd64 system:

2017/08/09 13:44:43 ZEBRA: Assertion `node->lock > 0' failed in file table.c, line 201, function route_unlock_node
2017/08/09 13:44:43 ZEBRA: Backtrace for 11 stack frames:
2017/08/09 13:44:43 ZEBRA: [bt 0] 0x8008cb958 <zlog_backtrace+0x28> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 1] 0x8008cbed0 <_zlog_assert_failed+0xa0> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 2] 0x8008bdd8e <route_unlock_node+0xfe> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 3] 0x8008bcc78 <if_terminate+0x48> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 4] 0x8008df063 <vrf_delete+0xa3> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 5] 0x8008df795 <vrf_terminate+0x35> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 6] 0x417f71 <zebra_zserv_socket_init+0x2731> at /usr/local/sbin/zebra
2017/08/09 13:44:43 ZEBRA: [bt 7] 0x8008d9007 <quagga_sigevent_process+0x47> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 8] 0x8008ba06f <thread_fetch+0x7af> at /usr/local/lib/libfrr.so.0
2017/08/09 13:44:43 ZEBRA: [bt 9] 0x4183f7 <main+0x3e7> at /usr/local/sbin/zebra
2017/08/09 13:44:43 ZEBRA: [bt 10] 0x4135cf <_start+0x17f> at /usr/local/sbin/zebra
2017/08/09 13:44:43 ZEBRA: Current thread not known/applicable

System log message is the same either way, signal 6.

Apr  9 11:50:16 river kernel: pid 40583 (zebra), uid 168: exited on signal 6

A forum user reports seeing a signal 11, waiting to see what hardware/setup they use:
https://forum.pfsense.org/index.php?topic=146410.0

cat /var/tmp/quagga.zebra.crashlog
ZEBRA: Received signal 11 at 1523178483 (si_addr 0x20); aborting...
Backtrace for 5 stack frames:
0x8008c9650 <zlog_backtrace_sigsafe+0x40> at /usr/local/lib/libfrr.so.0
0x8008c8e98 <zlog_signal+0x558> at /usr/local/lib/libfrr.so.0
0x8008dd344 <signal_init+0x244> at /usr/local/lib/libfrr.so.0
0x801596904 <pthread_sigmask+0x544> at /lib/libthr.so.3
0x801595e9f <pthread_getspecific+0xe2f> at /lib/libthr.so.3
no thread information available

Might be better to stay on FRR 3.0.x for the moment (maybe make a net/frr3 to be used by the GUI package, and keep frr as 4.0 so we can work with it still).

I didn't see any open issues on FRR's github that matched.

History

#1 Updated by Jim Pingle about 1 year ago

  • Subject changed from FRR 4.0 zebra daemon crashes with BGP to FRR 4.0 zebra daemon crashes

Looks like this isn't just specific to BGP. In the forum thread linked above, it is happening on multiple amd64 VMs that only use OSPF.

#2 Updated by Jim Pingle 12 months ago

Looks like others have noticed the problem as well:

https://lists.freebsd.org/pipermail/freebsd-ports/2018-June/113538.html
https://github.com/FRRouting/frr/issues/1907
https://github.com/FRRouting/frr/issues/2338

And the FreeBSD port maintainer moved net/frr to net/frr4 and created net/frr3 for a stable version that doesn't crash.
https://github.com/pfsense/FreeBSD-ports/commit/00b50894e2fad18970b0646da64136a2c5330460
https://github.com/pfsense/FreeBSD-ports/commit/90ff253bc1a1e013ad3ee5f3479e29b771270b17

I tried cherry-picking those from master to devel but there were some conflicts so I didn't want to clobber anything important. If we can get those merged in then net/pfSense-pkg-frr can be changed to depend on net/frr3 and we can all be back in business.

#3 Updated by xavier Lemaire 11 months ago

May be next release will be clean with us ?
https://github.com/FRRouting/frr/releases/tag/frr-5.0

#4 Updated by Jim Pingle 11 months ago

xavier Lemaire wrote:

May be next release will be clean with us ?
https://github.com/FRRouting/frr/releases/tag/frr-5.0

Given that both FRR crash issues I linked above are still marked as open, I do not have any hope of it being fixed upstream yet in any release.

#5 Updated by Jim Pingle 10 months ago

  • Status changed from Confirmed to Feedback

Package has been moved to use FRR 5.0.1 for testing, allegedly the crashes are fixed. Needs testing.

#6 Updated by Jim Pingle 10 months ago

  • Assignee changed from Renato Botelho to Jim Pingle

#7 Updated by xavier Lemaire 10 months ago

lets go to my sandbox ...

#8 Updated by xavier Lemaire 10 months ago

for the part bgp that interests me, in lab IPV4 ok. I let it run and I will test the bugs I have in the previous versions in IPV6 (nexthop enforce who don't work in 3.X)

#9 Updated by Jim Pingle 10 months ago

  • Status changed from Feedback to Resolved

This looks good with FRR 5.0.1. zebra is still running, no crashes, I'm getting routes from BGP and OSPF

Also available in: Atom PDF