Bug #10663
closeddhcpd issues duplicate addresses in certain situations on 2.4.5-p1 in HA mode.
0%
Description
As of at least 2.4.5-p1 (might also have occurred in 2.4.5 release, we are not sure as the jobs that cause this to happen may not have been run while we were on 2.4.5 release), dhcpd is issuing the same address to multiple hosts with different mac addresses and hostnames:
Jun 14 03:53:30 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:4d:7b:a5 (etcd-b08b51ba-c305-5eca-b2ce-fbc5703f9fa5) via lagg0.20
Jun 14 03:53:30 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:4d:7b:a5 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20
Jun 14 03:53:32 dhcpd DHCPDISCOVER from 52:54:00:ea:fc:f4 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20: load balance to peer dhcp_opt1
Jun 14 03:53:32 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:ea:fc:f4 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20
Jun 14 03:53:32 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:ea:fc:f4 (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20
Jun 14 03:53:36 dhcpd DHCPDISCOVER from 52:54:00:d8:66:ce (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20: load balance to peer dhcp_opt1
Jun 14 03:53:36 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:d8:66:ce (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20
Jun 14 03:53:36 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:d8:66:ce (etcd-fe7be7ee-d5ab-589d-8a72-537b5e0bf64f) via lagg0.20
Additional details are in the reddit thread, but this is happening in multiple different locations. Other symptoms (the issuing of the same address to like-named hosts, even though the have a different hostname and mac address) lead me to believe there is a parsing problem somewhere. Hosts that begin with cache-(some uuid) will always get the same IP address no matter how many time you destroy and re-create that particular service endpoint, even though they have completely different mac addresses and hostnames.
I turned off dhcpd and tested with a simple separately-hosted dnsmasq instance and the problem went away. I don't think it's a config problem on our side, as we have been using the same build process for years without issues.