Bug #15328
closedChanges in Kea DHCP interface pools may invalidate lease database content
100%
Description
I set up a couple of DHCP pools for VLANs on a new Netgate 4200 (running pfsense+ 23.09.1), which is replacing an EdgeRouter-X that had been serving DHCP to the same clients. That went fine, and I watched several of the existing VLAN clients re-acquire their existing addresses from the new server. Then I added another DHCP pool attached directly to the PORT2LAN interface. That completely confused matters for existing leases: the server actively rejected attempts to renew those leases and gave out addresses of its own choosing. Now I am seeing two different entries in the DHCP Leases status page for the same MAC address, which surely should not happen. Digging in the DHCP log entries, it looks like when the server was restarted because of the pool addition, all the lease reloads failed with complaints like
Mar 10 16:09:18 kea-dhcp4 39285 WARN [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_LEASE_SANITY_FAIL The lease 10.0.20.41 with subnet-id 2 failed subnet-id checks (the lease should have subnet-id 3).
10.0.20.41 is still shown (though as "down") in the Leases page, but there's also an entry for that client with its forcibly-assigned new IP address.
This isn't a fatal problem, assuming that the server manages to keep re-issuing these newly-chosen addresses, but it's mildly annoying. I'm not sure if there will be any outright conflicts as the remaining clients try to renew their leases.
Related issues
Updated by Tom Lane 9 months ago
Tom Lane wrote:
I'm not sure if there will be any outright conflicts as the remaining clients try to renew their leases.
It occurred to me that it might be best to flush the bogus leases using the "Clear all DHCP leases button", so I did that. Looking in dhcpd.log finds these entries in response:
Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.commands.0x401b3c12000] COMMAND_RECEIVED Received command 'lease4-wipe' Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 1 Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 1 finished, removed 1 leases Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 2 Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 2 finished, removed 4 leases Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4 removing all IPv4 leases from subnet 3 Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c12000] DHCPSRV_MEMFILE_WIPE_LEASES4_FINISHED removing all IPv4 leases from subnet 3 finished, removed 3 leases Mar 10 21:33:24 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.lease-cmds-hooks.0x401b3c12000] LEASE_CMDS_WIPE4 lease4-wipe command successful (parameters: <no args>)
Subnet 1 seems to be my PORT2LAN server, which had indeed given out one lease at that point. The other two are for my "guest" and "IoT" VLANs. There should never have been more than one lease in my "guest" VLAN's pool, so this seems like positive proof that there was confusion between the subnets. I venture that some more effort needs to be spent on ensuring that the subnet IDs remain stable across additions (and removals??) of DHCP servers for different interfaces.
Sadly, this did not stop Kea from doing whatever the heck it felt like. The first lease request after that was
Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.leases.0x401b3c16600] DHCP4_INIT_REBOOT [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db2: client is in INIT-REBOOT state and requests address 192.168.168.166 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1 Mar 10 21:49:57 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.leases.0x401b3c16600] DHCP4_LEASE_ADVERT [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db3: lease 192.168.168.129 will be advertised Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_lan_0 evaluated to 1 Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt3_0 evaluated to 1 Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.dhcpsrv.0x401b3c16600] EVAL_RESULT Expression pool_opt4_0 evaluated to 1 Mar 10 21:49:58 router1 kea-dhcp4[39285]: INFO [kea-dhcp4.leases.0x401b3c16600] DHCP4_LEASE_ALLOC [hwtype=1 12:3c:e3:11:ed:0b], cid=[01:12:3c:e3:11:ed:0b], tid=0xcf813db3: lease 192.168.168.129 has been allocated for 86400 seconds
which AFAICS is a flat out violation of the DHCP RFC --- the server had no reason not to give that client the address it asked for. When the dust settled, only about half of my clients had the addresses they'd had under the old server. Seems like Kea still has a few bugs, although I'm guessing the subnet ID assignment is more pfSense's fault.
Updated by Chris Lawrence 8 months ago
I can confirm this is happening to me as well. I added a new VLAN interface, new DHCP range, and now half of what is being requested ends up getting DHCPSRV_LEASE_SANITY_FAIL. Clearing DHCP leases seems to stop the error, but it does not properly register these for DNS A record. Additionally, statically assigned IP addresses aren't being properly recognized, and KEA is assigning timed leases to the statically assigned IP addresses. Half of my endpoints are not reachable by hostname due to this issue. This is a serious problem, I'm hoping it is getting looked at. Thank you Tom Lane for figuring this out.
Updated by Jim Pingle 7 months ago
- Subject changed from Kea DHCP corrupts existing leases when a new DHCP pool is added to Changes in Kea DHCP interface pools may invalidate lease database content
- Status changed from New to Confirmed
- Target version set to 2.8.0
- Plus Target Version set to 24.07
This appears to be a known issue in Kea, their documentation even warns about it:
https://kea.readthedocs.io/en/kea-2.4.1/arm/dhcp4-srv.html#ipv4-subnet-identifier
Their suggested workaround is to assign a number manually that is always associated with a particular subnet rather than auto-incrementing.
If the ID can be any string and not just an int we could use the interface names in some manner but it appears that the ID must be a number. This can make it tricky to ensure over time that things are unique in this way, since interfaces could not only be added/removed in Kea but also added/removed from the firewall as a whole. So maybe we follow a similar tactic to IPsec ikeid where they are assigned a particular ID the first time they're added, that ID is stored in the config for the interface/pool, and then we allocate a new unique ID the next time an interface or pool gets added.
Updated by Jim Pingle 6 months ago
- Plus Target Version changed from 24.07 to 24.08
Updated by Christian McDonald 5 months ago
- Status changed from Confirmed to In Progress
The fix for this will be included in the next significant update to Kea integration, which is still planned for 24.08.
Updated by Christian McDonald 5 months ago
- Status changed from In Progress to Feedback
Updated by Christian McDonald 5 months ago
I believe I've got this sorted out now.
Fix will be included in the next build.
Updated by Christian McDonald 5 months ago
- % Done changed from 0 to 100
Applied in changeset f774120b7dbf9811f574c056193d6b45246fa986.
Updated by Azamat Khakimyanov 5 months ago
- Status changed from Feedback to Resolved
Tested on 23.09.1 and on 24.08-DEVELOPMENT (built on Fri Jul 5 6:00:00 UTC 2024)
I was able to reproduce this issue on 23.09.1 - every time I changed DHCP Server settings on LAN interface, all hosts behind VLAN interfaces created on LAN got a new IP-addresses from their DHCP Pools.
On 24.08-DEV any changes with DHCP Server on LAN had no any effect of DHCP Leases on VLANs. Every time I changed DHCP Server settings on LAN, all hosts behind VLANs got the same IP-addresses they had before these changes.
I marked this Bug as resolved.
Updated by Jim Pingle 2 months ago
- Has duplicate Bug #15735: Kea fails to give out leases after changing DHCP scope added
Updated by Jim Pingle about 2 months ago
- Plus Target Version changed from 24.08 to 24.11