Bug #10416
closeddhcrelay command line options not properly configured for some DHCP failover scenarios
100%
Description
Scenario: ISC DHCP failover, with one of the 2 servers in the failover association residing in a subnet that also serves clients. Currently, this subnet will be set by services.inc as an upstream subnet. This breaks DHCP failover, as the 2nd (out-of-network) server of the failover association will not be relayed DHCPDISCOVER packets from clients in said subnet, which is required for proper operation of ISC DHCP failover. If a subnet is both a client subnet and where a DHCP server resides, "-i" should be used instead of "-iu" or "-id". Patch (attached) fixes this issue, although a more PHP savvy person should definitely look it over. (I'm a Perl guy, not PHP.)
Please note, this type of configuration is fairly common in small site scenarios, with a larger hub site as a DHCP failover peer for multiple sites.
Files
Updated by Jim Pingle over 4 years ago
Why would you need to relay to a server in the same subnet as the clients it serves and the firewall? They can get a reply from it directly.
Using -i
could re-introduce problems like #9466 and some others which were also solved by that change.
Alternately, we could add separate boxes for interfaces selection to pick upstream and downstream interfaces, defaulting to the current automatic behavior.
Either way, changes must be submitted as pull requests on Github: https://docs.netgate.com/pfsense/en/latest/development/submitting-a-pull-request-via-github.html
Updated by John Steele over 4 years ago
Jim Pingle wrote:
Why would you need to relay to a server in the same subnet as the clients it serves and the firewall? They can get a reply from it directly.
Yes, from the in-subnet server you can, of course. However, the FO peer (in another subset), will never see the DHCPDISCOVER packet. For ISC DHCP failover (see https://tools.ietf.org/html/draft-ietf-dhc-failover-12), both servers need to the see the discover packet, then they both hash the client-ID to compute which one responds. The “losing” peer won’t respond. So, the current behavior, since it doesn’t relay in that scenario, breaks DHCP FO.
Using
-i
could re-introduce problems like #9466 and some others which were also solved by that change.
Agreed, which is why the change wasn’t just reinstating the “-i” behavior across the board, but only for interfaces that had overlapping behavior, hopefully to help mitigate.
Alternately, we could add separate boxes for interfaces selection to pick upstream and downstream interfaces, defaulting to the current automatic behavior.
That still wouldn’t fix the behavior, unfortunately, as a “global” setup. The really “right” answer, which is beyond my coding ability, is the option for separate instances of dhcrelay for each interface, such that, for this situation, you would only specify the one off-network server for relay, but specify different server lists for other networks. This would actually be a feature enhancement, of course, but a welcome one. That could also be problematic, though, as you’d most likely need multiple dhcrelay instances binding to the same interface/port.
Either way, changes must be submitted as pull requests on Github: https://docs.netgate.com/pfsense/en/latest/development/submitting-a-pull-request-via-github.html
The patch was just me trying to be overly helpful, as I had to get this to work anyway. Apologies for breaking protocol.
Updated by Jim Pingle over 4 years ago
John Steele wrote:
Jim Pingle wrote:
Why would you need to relay to a server in the same subnet as the clients it serves and the firewall? They can get a reply from it directly.
Yes, from the in-subnet server you can, of course. However, the FO peer (in another subset), will never see the DHCPDISCOVER packet. For ISC DHCP failover (see https://tools.ietf.org/html/draft-ietf-dhc-failover-12), both servers need to the see the discover packet, then they both hash the client-ID to compute which one responds. The “losing” peer won’t respond. So, the current behavior, since it doesn’t relay in that scenario, breaks DHCP FO.
I thought of that after but not using DHCP Relay much I wasn't sure if that was really viable or allowed. It seems like a strange design choice but if that's how it works, so be it.
Using
-i
could re-introduce problems like #9466 and some others which were also solved by that change.Agreed, which is why the change wasn’t just reinstating the “-i” behavior across the board, but only for interfaces that had overlapping behavior, hopefully to help mitigate.
I get that but there are multiple issues we've had in the past reported with -i
on its own, so I think no matter how we try to guess automatically, it's going to be problematic for the user.
Alternately, we could add separate boxes for interfaces selection to pick upstream and downstream interfaces, defaulting to the current automatic behavior.
That still wouldn’t fix the behavior, unfortunately, as a “global” setup.
Having separate choices for upstream and downstream would give the user manual control over which one is used for each role and if an interface is chosen for both roles, then -i
could be used instead of the more specific option. It lets the user determine what happens rather than making the backend code guess which is what it does now (and what your patch also does, but with -i
).
The really “right” answer, which is beyond my coding ability, is the option for separate instances of dhcrelay for each interface, such that, for this situation, you would only specify the one off-network server for relay, but specify different server lists for other networks. This would actually be a feature enhancement, of course, but a welcome one. That could also be problematic, though, as you’d most likely need multiple dhcrelay instances binding to the same interface/port.
IMO, the best possible answer is to run IP Helper/DHCP Relay on the L2 -- switches, AP, etc, not the firewall/router. But not everyone has that choice.
Updated by John Steele over 4 years ago
Jim Pingle wrote:
Having separate choices for upstream and downstream would give the user manual control over which one is used for each role and if an interface is chosen for both roles, then
-i
could be used instead of the more specific option. It lets the user determine what happens rather than making the backend code guess which is what it does now (and what your patch also does, but with-i
).
That would actually be pretty cool. Like you said, my patch does that, just keeping in line w/ the "guessing" concept, computing the "right" option. Whether it's manual (as you defined), or automatic (ref my patch), it's the behavior that's important. No ego on the patch (again, was just my fix to get my network back up), just need the -i behavior potential.
IMO, the best possible answer is to run IP Helper/DHCP Relay on the L2 -- switches, AP, etc, not the firewall/router. But not everyone has that choice.
I completely agree, when it's doable, but like you said, it isn't always, especially on smaller sites, which is what burned me, actually.
Updated by Jim Pingle over 4 years ago
- Status changed from New to In Progress
- Assignee set to Jim Pingle
- Target version set to 2.4.5-p1
I couldn't get the patch to work as-is, the downstream list always ended up empty, but I found a variation which appears to do the right thing automatically.
Updated by Jim Pingle over 4 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 100
Applied in changeset a76e61149b79fe2892f6083454a563b860a035ab.
Updated by Jim Pingle over 4 years ago
- Status changed from Feedback to Resolved
dhcrelay
is running with the expected options now, using -i
when an interface is detected as both upstream and downstream.