Project

General

Profile

Actions

Bug #15973

open

Kea DHCP server crashes on 3100 (32bit ARM) every 10 days or so post 24.11 upgrade

Added by Loh Phat 4 months ago. Updated 20 days ago.

Status:
New
Priority:
Low
Category:
DHCP (IPv4)
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
Affected Architecture:
SG-3100

Description

The DHCP server dies with the following log entry:

Jan 4 11:16:23     kernel         pid 90595 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)

And then 5 minutes or so later kea DNS unregisters all the DCHP clients.


Jan 4 11:16:28     kea2unbound     50780     Remove record: "[REDACTED].localdomain. 2400 IN A 192.168.107.100" 
Jan 4 11:16:28     kea2unbound     50780     Remove record: "100.107.168.192.in-addr.arpa. 2400 IN PTR [REDACTED].localdomain." 

Are there other logs I can examine or change kea logging level to get more detail?

I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.

"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days

Actions #1

Updated by Björn Bylander 4 months ago

Loh Phat wrote:

The DHCP server dies with the following log entry:

[...]

And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
[...]

Are there other logs I can examine or change kea logging level to get more detail?

I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.

"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days

I've got the same error on my 3100. It's happened twice since upgrading on 2025-01-04. Once with signal 6 and once with signal 11:

2025-01-06 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
2025-01-12 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)

(edit 2025-01-17):

2025-01-17 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)

So so far the cadence seems to be every 6 days or so...

Actions #2

Updated by Blaik Harvey 4 months ago

Björn Bylander wrote in #note-1:

Loh Phat wrote:

The DHCP server dies with the following log entry:

[...]

And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
[...]

Are there other logs I can examine or change kea logging level to get more detail?

I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.

"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days

I've got the same error on my 3100. It's happened twice since upgrading on 2025-01-04. Once with signal 6 and once with signal 11:

2025-01-06 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
2025-01-12 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)

I am experiencing this same issue on my 3100, updated on 12-22 and hit the first failure on 1/2. Has happened twice since then. Most recent log:

Jan 15 08:31:27 kernel pid 73207 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)

Actions #3

Updated by Christian McDonald 4 months ago

  • Assignee set to Christian McDonald
  • Priority changed from Normal to High
Actions #4

Updated by Sander Peterse 4 months ago

Same issue here, same hardware. I have a core-dump available which I can share with Netgate. I don't want to share it here in public (might contain sensitive data).

Below a cron I have added as a workaround. This will restart the Kea DHCP service within less than 1 minute after a crash.

Minute: *
Hour: *
Day of the Month: *
Day of the Week: *
User: root
Command: pgrep -q kea-dhcp4 || /usr/local/sbin/pfSsh.php playback svc start kea-dhcp4
Actions #5

Updated by Jon Q 20 days ago

Same here, on SG-3100.
Nice share, Sander, about the crontab "for when it fails"

Also, in patches pkg, I found this, but haven't tried it yet (need a few days off sice things are slower around here for Easter). it seems to be relatred to this, though(https://redmine.pfsense.org/issues/15332).

commit 3bfd3a0efe5abf9ee47e6fdd1625fe5b8f9e21c3
Author: R. Christian McDonald <>
Date: Thu Dec 5 12:24:07 2024 -0500

kea: ignore default and max lease time within pool context. Fixes #15332

diff --git a/src/etc/inc/services.inc b/src/etc/inc/services.inc
index f7b7333469..ce79612034 100644
--- a/src/etc/inc/services.inc
+++ b/src/etc/inc/services.inc
@ -1578,16 +1578,6 @ function services_kea4_configure() {
];
}

- // default-lease-time
- if ($poolconf['defaultleasetime'] && ($poolconf['defaultleasetime'] != $dhcpifconf['defaultleasetime'])) {
- $keapool['valid-lifetime'] = $poolconf['defaultleasetime'];
- }

// max-lease-time
- if ($poolconf['maxleasetime'] && ($poolconf['maxleasetime'] != $dhcpifconf['maxleasetime'])) {
- $keapool['max-valid-lifetime'] = $poolconf['maxleasetime'];
- }
-
// ignore-client-uids
if (isset($poolconf['ignoreclientuids'])) {
$keasubnet['match-client-id'] = false;

Switching to ISC for now...

If anyone has info about the patch, and if it works or not, please share!

Actions #6

Updated by Christian McDonald 20 days ago

That patch resolves a problem that would prevent Kea from starting outright, not one that would cause it to crash after a few days of uptime.

The issue here impacting the 3100 is a byproduct of dwindling upstream support for 32 bit ARM platforms. That problem is actually much deeper than Kea.

Actions #7

Updated by Christian McDonald 20 days ago

  • Priority changed from High to Low
  • Affected Architecture SG-3100 added
Actions

Also available in: Atom PDF