Bug #10663
closeddhcpd issues duplicate addresses in certain situations on 2.4.5-p1 in HA mode.
0%
Description
As of at least 2.4.5-p1 (might also have occurred in 2.4.5 release, we are not sure as the jobs that cause this to happen may not have been run while we were on 2.4.5 release), dhcpd is issuing the same address to multiple hosts with different mac addresses and hostnames:
Jun 14 03:53:30 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:4d:7b:a5 (etcd-b08b51ba-c305-5eca-b2ce-fbc5703f9fa5) via lagg0.20
Jun 14 03:53:30 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:4d:7b:a5 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20
Jun 14 03:53:32 dhcpd DHCPDISCOVER from 52:54:00:ea:fc:f4 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20: load balance to peer dhcp_opt1
Jun 14 03:53:32 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:ea:fc:f4 (etcd-e37583ea-10f9-56c8-83f6-318975991110) via lagg0.20
Jun 14 03:53:32 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:ea:fc:f4 (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20
Jun 14 03:53:36 dhcpd DHCPDISCOVER from 52:54:00:d8:66:ce (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20: load balance to peer dhcp_opt1
Jun 14 03:53:36 dhcpd DHCPREQUEST for 10.100.6.121 (10.100.7.253) from 52:54:00:d8:66:ce (etcd-0ccfd615-c2e3-5fa0-b171-96a89fa024a5) via lagg0.20
Jun 14 03:53:36 dhcpd DHCPACK on 10.100.6.121 to 52:54:00:d8:66:ce (etcd-fe7be7ee-d5ab-589d-8a72-537b5e0bf64f) via lagg0.20
Additional details are in the reddit thread, but this is happening in multiple different locations. Other symptoms (the issuing of the same address to like-named hosts, even though the have a different hostname and mac address) lead me to believe there is a parsing problem somewhere. Hosts that begin with cache-(some uuid) will always get the same IP address no matter how many time you destroy and re-create that particular service endpoint, even though they have completely different mac addresses and hostnames.
I turned off dhcpd and tested with a simple separately-hosted dnsmasq instance and the problem went away. I don't think it's a config problem on our side, as we have been using the same build process for years without issues.
Updated by Chris Apsey over 5 years ago
After further investigation, here is what occurred:
1. We previously used a combination of ifupdown and network manager to control the network configurations of all of our endpoints, and recently changed to systemd-networkd across the board for the sake of simplicity.
2. ifupdown and nm generate DHCP DUIDs randomly when they first request an address
3. systemd-networkd uses /etc/machine-id to determine the DHCP DUID
4. The CentOS project distributes its cloud images without sysprepping /etc/machine-id (and images made with virt-builder are the same as well)
5. After switching to systemd-networkd, the fact that all hosts had the same /etc/machine-id immediately mattered, as it became the DUID that it would send along with its DHCPREQUEST
6. After receiving a DHCPREQUEST from a DUID, dhcpd in pfSense would always return that lease to everyone with the same DUID, even if they had a different hostname and mac address.
7. Dnsmasq used an alternate mechanism besides DUIDs to track leases as its primary key, so it worked 'correctly' with our setup.
So, it's not so much a bug with pfSense as it was we had a bad configuration running for years and only noticed after we standardized on systemd-networkd.
A good "feature enhancement" would be to have pfSense dhcpd alert the user in the log if it gets a DHCP request from mac address+hostname pair for a DUID that is already in the database with different information.
Otherwise, this "bug" can be closed. Sorry for the mixup!