Bug #16806
openDHCP client sends packets to the wrong interface with Multi-WAN
0%
Description
I have ix0 parent interface with 3 VLANs, one for each ISP: 10 (WAN), 20 (WAN2) and 30 (WAN3).
All ISPs use DHCP to assign IP addresses.
The issue is that pfSense often sends DHCP requests through the wrong interface.
For example, it sends packets from WAN2 IP address to WAN2's gateway, but using WAN interface (confirmed by doing packet capture both on the ix0.10 and on ix0 with VLAN 10).
Naturally, this causes DHCP issues on top of other unresolved DHCP client bugs I have been struggling with over the years.
Updated by Nazar Mokrynskyi about 2 months ago
I narrowed it down to broadcast working correctly, while unicast not working correctly.
This it caused by dhclient not installing routes for `dhcp-server-identifier` returned by DHCP server when DHCP server and gateway are on different subnets.
As the result, unicast packets go through the default gateway instead of the correct interface.
I bet this is also fixed upstream like https://redmine.pfsense.org/issues/14604, but the version included in pfSense/FreeBSD is ancient and not spec-compliant.
Updated by Nazar Mokrynskyi about 2 months ago
Updated by Marcos M 29 days ago
- Status changed from New to Incomplete
We haven't seen this issue in our testing yet. Additional troubleshooting is warranted to better understand what exactly is happening. If possible please detail steps to reproduce the issue between a pfSense server and pfSense client so we can look into it.
Updated by Nazar Mokrynskyi 29 days ago
I already explained what the issue is in the previous comment and even provided a pull request that fixes the issue.
What do you mean by "Status: Incomplete"???
All 3 ISPs I have exhibit this exact behavior: their gateways and DHCP servers are different machines on different subnets.
I imagine this is a very common thing in general.
Updated by Jim Pingle 29 days ago
We cannot reproduce this in lab or production conditions on any system. We cannot accept a change that could be potentially harmful to other configurations when we can't even reproduce the original problem.
When we packet capture DHCP requests and replies on Multi-WAN systems with more than one DHCP WAN, the expected addresses are used in all cases.
We need to know exactly how to reproduce this with a minimal configuration in lab conditions.
Updated by Nazar Mokrynskyi 29 days ago
Jim Pingle wrote in #note-5:
We cannot reproduce this in lab or production conditions on any system. We cannot accept a change that could be potentially harmful to other configurations when we can't even reproduce the original problem.
When we packet capture DHCP requests and replies on Multi-WAN systems with more than one DHCP WAN, the expected addresses are used in all cases.
We need to know exactly how to reproduce this with a minimal configuration in lab conditions.
Are your DHCP server and gateway on different subnets?
Please check PR, the changes there are fairly straightforward: it creates a new route for (previously ignored) `dhcp-server-identifier`.
It is obvious that without this explicit route the packet will not go to the correct interface (it will always go to the default gateway).
Updated by Flole Systems 27 days ago
This fix is incomplete, the underlying issue is significantly bigger and I've illustrated it at https://redmine.pfsense.org/issues/13502. While my approach back then was viable, the correct solution would be to put the dhclient process into a FIB and just accept the routes over DHCP for that FIB, then everything (including your case) would magically work.
Updated by Nazar Mokrynskyi 27 days ago
In the context of https://redmine.pfsense.org/issues/13502 the fix is indeed incomplete, but it still covers a large number of real-world situations and is certainly an improvement over status quo.
Very frustrating that the issue is simply closed with "Incomplete" despite describing a clear bug and even having a fix available and tested.
Yours seems fairly exotic and my knowledge of this stuff is certainly insufficient to address it.
Updated by Flole Systems 24 days ago
I still think that the proper solution would be to use FIBs for each dhclient to make them truly independent from each other and respect the routing information provided by the DHCP server. The change will be larger than yours and it will include determining the number of FIBs required and even dynamically changing it when an interface is added/removed (that's the "hard part" to be honest). Other than that, you then simply call "setfib X dhclient...." with X being the FIB for this interface (each one needs it's own in this scenario) and let dhclient assign the default route as that only affects the specific FIB.
Updated by Nazar Mokrynskyi 24 days ago
Not directly related to this issue, but looks like FIBs would also allow to reuse the same trusted monitoring IP across multiple WAN interfaces since default gateways apparently have lower priority for ICMP packets and that (incorrectly) results in packet loss sometimes when the interface is perfectly healthy (https://forum.netgate.com/post/1241904) as confirmed by my ISP.
My worry is that these larger changes will take forever to actually implement in pfSense judging by the age of https://redmine.pfsense.org/issues/13502 and even initial response to this issue.
Updated by Flole Systems 17 days ago
Yes, FIBs would allow that.
My original bug was marked as "needs to be fixed upstream", so it will never be fixed unless it's fixed upstream (which will probably be never, as my idea is a hack instead of a proper fix). So taking that as an example for something that "takes forever" in pfsense is not a good example as that wasn't in pfsense but upstream. But you can of course find cases where a fix has been implemented but hasn't been merged for 2 years, like in https://github.com/pfsense/pfsense/pull/4674.
The beauty of having an open source system however is, that you don't need to wait for someone to implement/merge your changes, you can develop and enjoy them yourself immediately (like I'm doing it for the above-mentioned PR) :)
Updated by Nazar Mokrynskyi 17 days ago
In cases like this, true.
But since build system is proprietary, I am still waiting for https://github.com/pfsense/FreeBSD-src/pull/57 to land in a stable release 10 months later.
But that is a digression. I'm hoping maintainers can reopen this obvious issue and review the PR one day.