Regression #11545
closedPrimary interface address is not always used when VIPs are present
Added by Kris Phillips over 3 years ago. Updated over 1 year ago.
100%
Description
If you have IP Aliases on a WAN interface that a Site to Site IPSec tunnel is riding over and upgrade from 2.4.5p1 to pfSense Plus, you have to go into the WAN interface and hit "Save" and "Apply Configuration" then restart the IPsec service to bring tunnels up post-upgrade. Otherwise IPSec will never connect no matter how many times you cycle the service.
Step by Step:
1. Create IPSec on WAN interface with several IP Aliases
2. Upgrade to 21.02/21.02p1
3. IPSec is broken, so you go into the WAN interface, hit save with no changes, and Apply Changes.
4. Restart IPSec service
Tunnels now work.
Related issues
Updated by Steve Wheeler over 3 years ago
- Category changed from IPsec to Interfaces
- Target version set to Plus-Next
This appears to be a more general issue that can affect IPSec.
In some situations the interface can start to use a VIP IP as the primary address. That causes things running on the interface to fail as they use the wrong address.
I have seen that with an OpenVPN server.
You can see by checking Status > Interfaces.
Resaving the interface corrects the IP allowing services to start.
Updated by Viktor Gurov over 3 years ago
Could be the same issue as #5999 (service takes the first IP address on the interface, instead of a non-VIP address)
Updated by Jim Pingle over 3 years ago
- Tracker changed from Bug to Regression
- Project changed from pfSense Plus to pfSense
- Subject changed from Upgrading from 2.4.5p1 to 21.02/21.02p1 with IP Aliases on a WAN interface causes IPSec issues to Primary interface address is not always used when VIPs are present
- Category changed from Interfaces to Interfaces
- Target version changed from Plus-Next to CE-Next
- Affected Plus Version deleted (
21.02) - Affected Version set to 2.5.0
Sounds more like a new variation or regression of #3997
Doubtful that this is specific to Plus, so moving to pfSense.
Updated by Jim Pingle over 3 years ago
- Related to Bug #3997: get_interface_ip() returns first IP on interface, not necessarily primary IP added
Updated by Jim Pingle over 3 years ago
- Target version changed from CE-Next to 2.5.1
Should at least take a stab at this to see if we can come up with a workaround for now.
Updated by Renato Botelho over 3 years ago
- Target version changed from 2.5.1 to CE-Next
Not enough time for 2.5.1
Updated by Kris Phillips over 3 years ago
Ran into this again today on a pfSense Plus 21.02.2 upgrade. Had to do the following to fix it:
1. Save the VIP being used by a VTI tunnel without making changes and apply
2. Save the IPSec tunnel using the VTI tunnel
3. Stop and Start the IPSec service
Otherwise the IPSec service would simply say "acquiring job X" and then the "no trap found" for VTI tunnels because of strongswan puking on non-tunnel mode tunnels. No traffic was being generated by the interface at all in a pcap and the IPSec service simply puked before it even start generating traffic.
Easy workaround, but annoying and we should track down what is causing the VIPs to not properly re-tie to the IPSec service.
This affects both OpenVPN and IPSec. May also affect Wireguard depending.
Updated by Steve Wheeler over 3 years ago
This only seems to affect VPN tunnels where I assume the interface IP is read directly from the interface causing the problem.
It does not affect outbound NAT for example, where the translation address remains the configured WAN IP and not the incorrect primary interface address.
Updated by Viktor Gurov over 3 years ago
Updated by M Felden over 3 years ago
I believe I am seeing this now after upgrading 2.4.5-p1 -> 2.5.1-CE with FRR BGP where FRR is told to use the WAN IPv6 address to establish BGP sessions but it chooses an IPv6 VIP for outbound communication instead. This is rejected by the peer and the session fails. Consequently it brings down all IPv6 because the Virtual IP in question is part of the prefix announced by that BGP session. Removing the IP alias and rebooting will bring us back to normal.
Updated by Kris Phillips over 3 years ago
M Felden wrote:
I believe I am seeing this now after upgrading 2.4.5-p1 -> 2.5.1-CE with FRR BGP where FRR is told to use the WAN IPv6 address to establish BGP sessions but it chooses an IPv6 VIP for outbound communication instead. This is rejected by the peer and the session fails. Consequently it brings down all IPv6 because the Virtual IP in question is part of the prefix announced by that BGP session. Removing the IP alias and rebooting will bring us back to normal.
Hello M Felden,
Per my previous redmine reply, you only need to resave the VIP and interface. There is no need to remove it, although that is a valid solution as well.
Updated by M Felden over 3 years ago
Per my previous redmine reply, you only need to resave the VIP and interface. There is no need to remove it, although that is a valid solution as well.
Hi Kris,
This workaround works but does not persist reboots and certain service restarts seem to break it, too. I can get the box up like this but it is very fragile.
Updated by Denny Page about 3 years ago
Still seeing this in 21.05.2... any possibility this will be addressed soon?
Updated by Steve Wheeler about 3 years ago
We have been unable to replicate this issue in any sort of repeatable way which makes it almost impossible to dig into.
If anyone has steps to replicate it please add them here.
Updated by Denny Page about 3 years ago
I can share info from my install if you like. Unless I disable DHCP6 on the WAN interface, I am currently hitting the issue on every reboot.
Updated by Kris Phillips about 3 years ago
Denny Page wrote in #note-15:
I can share info from my install if you like. Unless I disable DHCP6 on the WAN interface, I am currently hitting the issue on every reboot.
Denny,
What version of pfSense are you running right now? Are you saying that VIPs disassociate from their interfaces on every reboot unless you turn off IPv6 on your WAN interface?
Updated by Denny Page about 3 years ago
Kris Phillips wrote in #note-16:
What version of pfSense are you running right now?
As noted above, 21.05.2.
Are you saying that VIPs disassociate from their interfaces on every reboot unless you turn off IPv6 on your WAN interface?
No. We're saying that IPsec chooses the wrong address on the WAN interface.
Updated by Jim Pingle about 3 years ago
- Has duplicate Bug #12532: Virtual IP problem with OpenVPN added
Updated by Dan Edwards almost 3 years ago
Also have the same issue on 21.05.1 on every install in 2 different scenarios. Scenario 1 WAN interface has /29 using the remaining IP's as VIP for NAT etc as IP Alias on the WAN interface so for example 1.1.1.0/29 using 1.1.1.1 as GW 1.1.1.2 as WAN and then 1.1.1.3-6 as VIP IP Alias. Using the wizard for initial configuration everything looks fine and works as expected. Reboot and then in the Dashboard the WAN shows as having the 1st IP Alias as it's address in the Interfaces widget. IPSEC is then also bound to the IP Alias instead of the actual WAN IP address. Workaround is to edit the WAN interface and set IPV6 Configuration type to None. This resolves the issue - WAN interface displays with correct IP address in Interfaces Widget and IPSEC is bound to correct address as well. Scenario 2 WAN Interface is /31 with a /29 routed to it so 1.1.1.2/31 for instance with 2.2.2.0/29 routed to 1.1.1.2 WAN IP Aliases configured for the 2.2.2.x addresses. Again after the initial setup all works but after first reboot WAN displays as 2.2.2.x in the interfaces widget and IPSEC binds to that as well. Go into WAN interface and set IPV6 to None and issue is resolved. It could be any edit on the WAN interface to be fair but have not tried that yet as setting IPV6 to none works - will try next time it happens as we have other installs coming up. With IPV6 set to none it seems to survive a reboot...
Updated by Marcos M almost 3 years ago
To clarify, are these new installs, or upgrades? What platform (e.g. AWS)? And yes, try reproducing it and just click Save without changing anything on the interface and let us know if that works and if it survives a reboot.
Updated by Denny Page almost 3 years ago
I was just bit by this again this morning. Every reboot. Very frustrating. Steve, if you need any information on the configuration, please let me know.
Updated by Dan Edwards almost 3 years ago
Sorry, new installs on SG2100's and XG7100's, 1 or 2 have been upgraded from 21.05 to 21.05.1 but same issue on all.
Updated by Marcos M almost 3 years ago
For anyone that can reproduce this issue, it would be useful to know if this is still occurring in 22.01.
Updated by M Felden almost 3 years ago
Marcos Mendoza wrote in #note-23:
For anyone that can reproduce this issue, it would be useful to know if this is still occurring in 22.01.
I did not see release notes promising to fix it but will test on 2.6.0-CE-RC this week.
Updated by Kris Phillips almost 3 years ago
I haven't seen this occur at all in 22.01/2.6.
Updated by Denny Page almost 3 years ago
I also have not seen this post install of 22.01.
Updated by Jeff Quasarano over 2 years ago
I have this exact issue on 22.01. It manifests on reboot with OpenVPN server start binding to wrong IP. Note that on reboot, the pfsense webGUI dashboard shows the WAN IP incorrectly on reboot. Fix is to go in, disable and re-enable the WAN interface, then stop and start the OpenVPN server and it will properly rebind to 'WAN Address.' every time we need to reboot the firewall.
Also note that all other services and routing operate normally even though the WebGUI dashboard shows an incorrect WAN IP (static Wan address is set properly in Interfaces.)
(The incorrect IP it is binding to is an Alias IP for other services, which appears to be this bug.)
Updated by Kris Phillips over 2 years ago
Jeff Quasarano wrote in #note-27:
I have this exact issue on 22.01. It manifests on reboot with OpenVPN server start binding to wrong IP. Note that on reboot, the pfsense webGUI dashboard shows the WAN IP incorrectly on reboot. Fix is to go in, disable and re-enable the WAN interface, then stop and start the OpenVPN server and it will properly rebind to 'WAN Address.' every time we need to reboot the firewall.
Also note that all other services and routing operate normally even though the WebGUI dashboard shows an incorrect WAN IP (static Wan address is set properly in Interfaces.)
(The incorrect IP it is binding to is an Alias IP for other services, which appears to be this bug.)
Jeff,
What kind of interfaces are you using for WAN on your firewall?
Updated by Reid Linnemann over 2 years ago
When dynamic interface addresses change, say via DHCP, the common mechanism for handling the address transition is not to reconfigure the interface from the ground up but to add an alias to the new address and remove the original address. This is most readily apparent in the dhclient script behavior. When this happens, the most recently aliased address is last in the list, which would lead to this inversion of priority of the VIPs and dynamic address. Services that bind to the WAN interface need to be informed of the proper address to bind to rather than just taking the first available address. This would likely mean pfSense would need to determine the address by excluding VIPs from the config to narrow down the correct address in a dynamically addressed interface, or simply pulling the IP from the config for a statically addressed interface.
Updated by Viktor Gurov over 2 years ago
Should be fixed in #11629
Please re-test on the latest 22.05/2.7 snapshots.
Updated by Viktor Gurov over 2 years ago
- Related to Bug #11629: PPPoE WAN IP address different than expected when set static by ISP added
Updated by Viktor Gurov over 2 years ago
- Status changed from New to Feedback
- Assignee set to Viktor Gurov
- Target version changed from CE-Next to 2.7.0
- Plus Target Version set to 22.05
Updated by Jim Pingle over 2 years ago
- Status changed from Feedback to New
- Assignee changed from Viktor Gurov to Reid Linnemann
- Plus Target Version changed from 22.05 to 22.09
That other issue could solve it for PPP type interfaces but it's happening on systems without PPP interfaces and those changes wouldn't make a difference for systems without PPP involved.
Updated by Kris Phillips over 2 years ago
This bug definitely doesn't just happen with PPPoE interfaces. It is also not consistent and seems to be an "ordering" problem of when things get applied.
Additional hints to the potential cause of this:
I had a customer yesterday run into this bug with CARP VIPs. On his Status --> Dashboard under the interfaces widget, some of his interfaces were showing the CARP VIP as their interface IP and not the actual IP assigned to the interface. The CARP VIP was also not working properly until going through the same steps as above to work around it (save without changes, apply, restart service). After re-saving the VIP, the interface IP showed up correctly on the dashboard widget. This happened after an upgrade from 21.05.2 to 22.01.
Updated by Reid Linnemann over 2 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 100
Applied in changeset 3222c70aaf783336901f7b1225727b5973ba865a.
Updated by Reid Linnemann over 2 years ago
I believe I have a fix for this issue. I created a variation on pfSense_get_interface_addresses() named pfSense_get_ifaddrs(), which yields all interface addresses in addition to the interface attributes instead of just the first ipv4 and ipv6 address. This is wrapped by a php function get_interface_addresses() that excludes VIPs from the addresses returned by pfSense_get_ifaddrs(), and massages the return value to the array form that pfSense_get_interface_addresses() callers expect. All calls to pfSense_get_interface_addresses() have been changed to get_interface_addresses(). This should appear in the next snapshot builds. If the addresses returned by pfSense_get_interface_addresses() are the only culprit, this should resolve the issue.
Updated by Jim Pingle over 2 years ago
- Plus Target Version changed from 22.09 to 22.11
Updated by Jim Pingle over 2 years ago
- Status changed from Feedback to In Progress
Since this went in my GIF interface doesn't seem to be working properly, and it might affect others. It was working previously. The GUI says the address on the interface is "n/a" and gateway monitoring is stuck on "pending" but the interface itself has a valid/working address:
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280 description: WANv6 options=80000<LINKSTATE> tunnel inet m.m.m.m --> h.h.h.h inet6 2001:x:x:x::2 --> 2001:x:x:x::1 prefixlen 128 inet6 fe80::xxxx%gif0 prefixlen 64 scopeid 0x8 groups: gif nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
The interface is up and working, default route is in the table, I can ping out, etc. but the GUI claims the interface has no address on the dashboard widget, status_interfaces.php, console menu banner, and elsewhere.
Other similar types of tunneled interfaces like IPsec VTI and WireGuard appear to be OK. I don't have anything with a GRE handy but given its history of behaving nearly identical to GIF that should also be checked.
Updated by Reid Linnemann over 2 years ago
Found it, it looks like I had some confusion in my array keys migrating the v6 address from the output of pfSense_get_ifaddrs() to the expected output format of pfSense_get_interface_addresses(). Merge request is pending review. Note that this only affected static and DHCP assigned GUAs. For a reason I don't yet understand, get_interface_ipv6() uses a separate method to determine track6 addresses that calls pfSense_getall_interface_addresses() and filters out LL addresses and VIPs. At some point I'll be refactoring code to migrate further toward using a single module func to get all interface addresses in an easy to digest form.
Should be fixed by b0d417e2fc.
Updated by Jim Pingle over 2 years ago
- Status changed from In Progress to Feedback
Updated by Jim Pingle about 2 years ago
- Plus Target Version changed from 22.11 to 23.01
Updated by Jim Pingle about 2 years ago
- Status changed from Feedback to In Progress
The IPv6 GIF interfaces still have an issue here. The interface address is reported properly by the GUI now, but the gateway code is confused. It knows there should be a dynamic gateway for these interfaces but it isn't seeing the gateway address.
System > Routing shows "dynamic" for the address(es) in the table where it should normally show the GIF tunnel remote address. Gateway monitoring is stuck showing "Pending" and there are no corresponding dpinger processes. There are also no /tmp/<if>_routerv6
files for the GIF interfaces as there are for other dynamic gateway interfaces.
Updated by Reid Linnemann about 2 years ago
- Status changed from In Progress to Feedback
Applied in changeset 2b66dafae80f4a17c4cfc4a5f548f336b97513de.
Updated by Jim Pingle about 2 years ago
- Start date deleted (
02/25/2021)
All the issues I could reproduce here are fixed now. If we could get some more feedback from users who encountered the original issue with VIPs being treated as the interface address then we can close this out entirely.
Updated by Jim Pingle almost 2 years ago
- Status changed from Feedback to Resolved
No feedback (positive or negative) and it's been in snapshots for quite some time now. Closing this now, but if anyone else manages to reproduce it with the current codebase, we can open it back up.
Updated by M Felden over 1 year ago
Updated a patched 2.6.0 to 2.7.0.r.20230622.0600 and the issue https://redmine.pfsense.org/issues/11545#note-10 has returned.