Project

General

Profile

Actions

Bug #15328

open

Kea DHCP corrupts existing leases when a new DHCP pool is added

Added by Tom Lane about 2 months ago. Updated 11 days ago.

Status:
New
Priority:
Normal
Category:
DHCP (IPv4)
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

I set up a couple of DHCP pools for VLANs on a new Netgate 4200 (running pfsense+ 23.09.1), which is replacing an EdgeRouter-X that had been serving DHCP to the same clients. That went fine, and I watched several of the existing VLAN clients re-acquire their existing addresses from the new server. Then I added another DHCP pool attached directly to the PORT2LAN interface. That completely confused matters for existing leases: the server actively rejected attempts to renew those leases and gave out addresses of its own choosing. Now I am seeing two different entries in the DHCP Leases status page for the same MAC address, which surely should not happen. Digging in the DHCP log entries, it looks like when the server was restarted because of the pool addition, all the lease reloads failed with complaints like

Mar 10 16:09:18 kea-dhcp4 39285 WARN [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_LEASE_SANITY_FAIL The lease 10.0.20.41 with subnet-id 2 failed subnet-id checks (the lease should have subnet-id 3).

10.0.20.41 is still shown (though as "down") in the Leases page, but there's also an entry for that client with its forcibly-assigned new IP address.

This isn't a fatal problem, assuming that the server manages to keep re-issuing these newly-chosen addresses, but it's mildly annoying. I'm not sure if there will be any outright conflicts as the remaining clients try to renew their leases.

Actions #1

Updated by Tom Lane about 2 months ago

Tom Lane wrote:

I'm not sure if there will be any outright conflicts as the remaining clients try to renew their leases.

It occurred to me that it might be best to flush the bogus leases using the "Clear all DHCP leases button", so I did that. Looking in dhcpd.log finds these entries in response:

Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.commands.0x401b3c12000] COMMAND_RECEIVED Received command 'lease4-wipe'
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 1
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 1 finished, removed 1 leases
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 2
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 2 finished, removed 4 leases
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 3
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 3 finished, removed 3 leases
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.lease-cmds-hooks.0x401b3c12000] LEASE_CMDS_WIPE4 lease4-wipe command successful (parameters: <no args>)

Subnet 1 seems to be my PORT2LAN server, which had indeed given out one lease at that point. The other two are for my "guest" and "IoT" VLANs. There should never have been more than one lease in my "guest" VLAN's pool, so this seems like positive proof that there was confusion between the subnets. I venture that some more effort needs to be spent on ensuring that the subnet IDs remain stable across additions (and removals??) of DHCP servers for different interfaces.

Sadly, this did not stop Kea from doing whatever the heck it felt like. The first lease request after that was

Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.leases.0x401b3c16600] DHCP4_INIT_REBOOT [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db2: client is in INIT-REBOOT state and requests address 192.168.168.166
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.leases.0x401b3c16600] DHCP4_LEASE_ADVERT [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db3: lease 192.168.168.129 will be advertised
Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1
Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1
Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1
Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO  [kea-dhcp4.leases.0x401b3c16600] DHCP4_LEASE_ALLOC [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db3: lease 192.168.168.129 has been allocated for 86400 seconds

which AFAICS is a flat out violation of the DHCP RFC --- the server had no reason not to give that client the address it asked for. When the dust settled, only about half of my clients had the addresses they'd had under the old server. Seems like Kea still has a few bugs, although I'm guessing the subnet ID assignment is more pfSense's fault.

Actions #2

Updated by Christian McDonald about 2 months ago

  • Assignee set to Christian McDonald
Actions #3

Updated by Chris Lawrence 11 days ago

I can confirm this is happening to me as well. I added a new VLAN interface, new DHCP range, and now half of what is being requested ends up getting DHCPSRV_LEASE_SANITY_FAIL. Clearing DHCP leases seems to stop the error, but it does not properly register these for DNS A record. Additionally, statically assigned IP addresses aren't being properly recognized, and KEA is assigning timed leases to the statically assigned IP addresses. Half of my endpoints are not reachable by hostname due to this issue. This is a serious problem, I'm hoping it is getting looked at. Thank you Tom Lane for figuring this out.

Actions

Also available in: Atom PDF