Project

General

Profile

Actions

Bug #16019

open

Kea can unintentionally attempt to spawn multiple processes and fail

Added by Kris Phillips about 2 months ago. Updated 8 days ago.

Status:
Feedback
Priority:
Normal
Category:
DHCP (IPv4)
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
25.03
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

When doing things like restarting the Kea service or switching between ISC and Kea, there is a possibility of Kea attempting to launch multiple copies of itself.

I haven't been able to recreate the exact situation where this happens, but we've had 4-5 different customers experience the same issue. Once the problem occurs, the following log items are present repeatedly:

kea-dhcp458922: WARN [kea-dhcp4.dhcpsrv.0xf488ce12000] DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: Failed to open socket on interface [Interface Here], reason: failed to bind fallback socket to address [Interface Address Here], port 67, reason: Address already in use - is another DHCP server running?

Once this happens, you have to manually kill the duplicate processes and then restart the Kea service for the problem to go away.

Actions #1

Updated by Sevi A about 1 month ago

Thanks for the info! I just ran into the same issue and I think it's also mentioned in the forums here: https://forum.netgate.com/topic/195583/kea-dhcp-reservations-not-being-honored

I'm running 24.11 on a Netgate 4100 and noticed the issue when some devices wouldn't get a new IP through DHCP after updating static mappings. I think the issue appeared for me after I switched to KEA from ISC (already a while back so I can't recall if I restarted the device in the meantime, but most likely not) and recently added two new VLANs with corresponding interfaces and DHCP servers.

At least that's what the output of `sockstat -l | grep :67` suggests:

root     kea-dhcp4  15901 16  udp4   192.168.104.1:67      *:*
root     kea-dhcp4  15901 18  udp4   192.168.60.1:67       *:*
root     kea-dhcp4  33016 33  udp4   192.168.66.1:67       *:*
< removed redundant lines for 10 more networks with PID 33016 >

The networks with the server listening on 192.168.60.1/24 and 192.168.104.1/24 are the newly created interfaces and run on a different process. The issue appeared with a host on one of the "old" networks (192.168.66.1/24): It got an IP from the DHCP pool, but never updated after assigning a static mapping.

Similar to the forum post above, stopping the DHCP server through the web GUI only stopped the new process 15901, but the old process 33016 kept running and had to be killed manually to solve the issue.

I would love to help, but I have no clues on how to reproduce the situation or how to diagnose this further.

Actions #2

Updated by Christian McDonald about 1 month ago

  • Status changed from New to Feedback
  • Assignee set to Christian McDonald
  • Target version set to 2.8.0
  • Plus Target Version changed from 24.11 to 25.07

A mitigation has been added for this.

Actions #3

Updated by Christian McDonald about 1 month ago

  • Plus Target Version changed from 25.07 to 25.03
Actions #4

Updated by Jim Pingle 17 days ago

  • Subject changed from Kea-dhcp4 can sometimes get "stuck" with multiple processes running and cause binding issues to Kea can unintentionally attempt to spawn multiple processes and fail
Actions #5

Updated by Jason Montleon 8 days ago

I ran into this on 1 6100. I recently assigned an interface to add a new network and enabled the dhcp server on the interface. Today I was trying to update a clients settings and they were not taking effect.

Looking in the log I saw the 'DHCPSRV_OPEN_SOCKET_FAIL' error message going back to around the time I configured the new interface, though at this point it is hard for me to say if it coincides exactly.

I stopped the service in the web UI and found there were still two processes running. After killing them and restarting the service it seems to be working correctly again. I wasn't smart enough to look at sockstat output before killing them, but if I had to guess one was listening on each interface similar to what Sevi A saw.

It seems like it could be a similar situation to his, "...recently added two new VLANs with corresponding interfaces and DHCP servers..."

Actions

Also available in: Atom PDF