Bug #15973
openKea DHCP server crashes on 3100 (32bit ARM) every 10 days or so post 24.11 upgrade
0%
Description
The DHCP server dies with the following log entry:
Jan 4 11:16:23 kernel pid 90595 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)
And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
Jan 4 11:16:28 kea2unbound 50780 Remove record: "[REDACTED].localdomain. 2400 IN A 192.168.107.100" Jan 4 11:16:28 kea2unbound 50780 Remove record: "100.107.168.192.in-addr.arpa. 2400 IN PTR [REDACTED].localdomain."
Are there other logs I can examine or change kea logging level to get more detail?
I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.
"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days
Updated by Björn Bylander 4 months ago
Loh Phat wrote:
The DHCP server dies with the following log entry:
[...]
And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
[...]Are there other logs I can examine or change kea logging level to get more detail?
I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.
"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days
I've got the same error on my 3100. It's happened twice since upgrading on 2025-01-04. Once with signal 6 and once with signal 11:
2025-01-06 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
2025-01-12 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)
(edit 2025-01-17):
2025-01-17 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
So so far the cadence seems to be every 6 days or so...
Updated by Blaik Harvey 4 months ago
Björn Bylander wrote in #note-1:
Loh Phat wrote:
The DHCP server dies with the following log entry:
[...]
And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
[...]Are there other logs I can examine or change kea logging level to get more detail?
I've switch back to ISC DHCP to make sure it's not something else. I'll switch back to kea if there's more telemetry needed for the bug.
"ISC DHCP has reached end-of-life and will be removed from a future version of Netgate pfSense Plus. Kea DHCP is the newer, modern DHCP distribution from ISC that includes the most-requested features." is in the PfSense GUI on the Advanced /Networking page but no notice that Kea DHCP on 32-bit ARM platforms may be unstable -- as mentioned in the following forum post: https://forum.netgate.com/topic/195842/after-upgrading-to-24-11-dhcp-fails-every-10-14-days
I've got the same error on my 3100. It's happened twice since upgrading on 2025-01-04. Once with signal 6 and once with signal 11:
2025-01-06 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
2025-01-12 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)
I am experiencing this same issue on my 3100, updated on 12-22 and hit the first failure on 1/2. Has happened twice since then. Most recent log:
Jan 15 08:31:27 kernel pid 73207 (kea-dhcp4), jid 0, uid 0: exited on signal 6 (core dumped)
Updated by Christian McDonald 4 months ago
- Assignee set to Christian McDonald
- Priority changed from Normal to High
Updated by Sander Peterse 4 months ago
Same issue here, same hardware. I have a core-dump available which I can share with Netgate. I don't want to share it here in public (might contain sensitive data).
Below a cron I have added as a workaround. This will restart the Kea DHCP service within less than 1 minute after a crash.
Minute: * Hour: * Day of the Month: * Day of the Week: * User: root Command: pgrep -q kea-dhcp4 || /usr/local/sbin/pfSsh.php playback svc start kea-dhcp4
Updated by Jon Q 20 days ago
Same here, on SG-3100.
Nice share, Sander, about the crontab "for when it fails"
Also, in patches pkg, I found this, but haven't tried it yet (need a few days off sice things are slower around here for Easter). it seems to be relatred to this, though(https://redmine.pfsense.org/issues/15332).
commit 3bfd3a0efe5abf9ee47e6fdd1625fe5b8f9e21c3
Author: R. Christian McDonald <cmcdonald@netgate.com>
Date: Thu Dec 5 12:24:07 2024 -0500
kea: ignore default and max lease time within pool context. Fixes #15332
diff --git a/src/etc/inc/services.inc b/src/etc/inc/services.inc
index f7b7333469..ce79612034 100644
--- a/src/etc/inc/services.inc
+++ b/src/etc/inc/services.inc@ -1578,16 +1578,6
@ function services_kea4_configure() {
];
}
- // default-lease-time
- if ($poolconf['defaultleasetime'] && ($poolconf['defaultleasetime'] != $dhcpifconf['defaultleasetime'])) {
- $keapool['valid-lifetime'] = $poolconf['defaultleasetime'];
- } // max-lease-time
- if ($poolconf['maxleasetime'] && ($poolconf['maxleasetime'] != $dhcpifconf['maxleasetime'])) {
- $keapool['max-valid-lifetime'] = $poolconf['maxleasetime'];
- }
-
// ignore-client-uids
if (isset($poolconf['ignoreclientuids'])) {
$keasubnet['match-client-id'] = false;
Switching to ISC for now...
If anyone has info about the patch, and if it works or not, please share!
Updated by Christian McDonald 20 days ago
That patch resolves a problem that would prevent Kea from starting outright, not one that would cause it to crash after a few days of uptime.
The issue here impacting the 3100 is a byproduct of dwindling upstream support for 32 bit ARM platforms. That problem is actually much deeper than Kea.
Updated by Christian McDonald 20 days ago
- Priority changed from High to Low
- Affected Architecture SG-3100 added