Bug #12947
closedDHCP6 client does not take any action if the interface IPv6 address changes during renewal
0%
Description
I recently started using T-Mobile 5G Home Internet. The gateway device you're required to use is almost completely unconfigurable. You can't even change DHCP or DNS settings, forget about having a bridge mode or any sort of DMZ or IP bypass. So for IPv6 I need to use NAT, which actually works pretty well.
The problem is that occasionally the IPv6 address will change without the interface going down. First the Router Advertisements change and NAT and dpinger break. Eventually dhcp6c notices the change but does nothing about it:
Mar 11 03:44:26 router dhcp6c[32738]: Sending Renew Mar 11 03:44:27 router dhcp6c[32738]: dhcp6c Received INFO Mar 11 03:44:27 router dhcp6c[32738]: remove an address 2607:fb90:5128:15d0:24dc:4e2:d0b6:df11/128 on igb2 Mar 11 03:44:27 router dhcp6c[32738]: add an address 2607:fb90:5120:e987:24dc:4e2:d0b6:df11/128 on igb2 Mar 11 03:44:27 router dhcp6c[32738]: T1(1125) and/or T2(1800) is locally determined
Manually restarting dpinger and reloading the filters gets things working again, but what probably really needs to happen is to run /etc/rc.newwanipv6.
I'm thinking of patching (the generation of) /var/etc/dhcp6c_opt1_script.sh to run /etc/rc.newwanipv6 if the current interface IPv6 address does not match /var/db/opt1_ipv6 after a DHCPv6 INFO or RENEW. Is this a reasonable short-term fix?
Files
Related issues
Updated by Jim Pingle almost 3 years ago
- Assignee set to Jim Pingle
- Target version set to 2.7.0
- Plus Target Version set to 22.05
For that to trigger the client would have to fire the script during an event when the change occurs. It may not, but it's hard to say for sure based on the logs you have. For starters, go to System > Advanced on the Networking tab and check "DHCP6 Debug" and see what it logs at the time.
It's possible that would get triggered by the script at the RENEW case but it's not certain. The next problem is that at least according to the documentation, it doesn't look like the script will get any environment variables populated with data it could leverage to compare old/new IP addresses similar to what is done for the IPv4 client. It might be viable to read the address out of the /var/db/<if>_ipv6
file and compare, like you suggest, but it would require testing to confirm.
Updated by Jim Pingle almost 3 years ago
- Subject changed from IPv6 address change not noticed to DHCP6 client does not take any action if the interface IPv6 address changes during renewal
Updated by Jim Pingle almost 3 years ago
- File dhcp6c-renew-12947.diff dhcp6c-renew-12947.diff added
- Assignee deleted (
Jim Pingle) - Target version deleted (
2.7.0) - Plus Target Version deleted (
22.05)
I tried altering the script so it would fire during a renew with mixed success. Though I found another odd behavior. If I change the IP address assigned to the device in upstream DHCP, the next renew it picks up the new address. However, it gets added to the interface, so both the old and new are present. The old address isn't removed until its lease expires, even though it's supposed to be using the new address instead. It's marked deprecated in the meantime:
vtnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: WAN options=800b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE> ether ca:1d:62:6c:c6:9c inet6 fe80::c81d:62ff:fe6c:c69c%vtnet0 prefixlen 64 scopeid 0x1 inet6 2001:db8::103:1 prefixlen 128 deprecated inet6 2001:db8::103:2 prefixlen 128 inet 198.51.100.103 netmask 0xffffff00 broadcast 198.51.100.255 media: Ethernet 10Gbase-T <full-duplex> status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
The functions that fetch and check the interface address only get the old address which remains on the interface, not the additional address that came through.
When picking up the new address dhcp6c also seems to get stuck in a bit of a loop of REQUEST and RENEW messages (which I've seen be anywhere from ~15 seconds apart to ~60 seconds apart), but it may be related to the changes made here and not something else. It's been an issue in the past, though, see #11100 and #9634
Attached is a patch to try, though it's not very well optimized, it should at least show a difference in behavior.
Clearing the targets and assignment since there is a bit more happening here than I thought.
Updated by David Myers almost 3 years ago
- File dhcpd.log dhcpd.log added
- File system.log system.log added
The patch didn't work.
I applied the patch to my 2.5.2 system then enabled DHCP6 client debug mode and saved the interface in order to force the dhcp6c client script to get regenerated. I did not reboot.
The IPv6 address changed at approximately 13:07 in the attached logs, which are edited from about 13:00 until rc.newwanipv6 ran a couple times, and with DHCPv4 messages removed.
One thing I don't understand is why there are no messages in the logs from the calls to logger from the dhcp6c client script. The logger command works fine from the command line.
While IPv6 was not working ifconfig showed this:
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1420 description: LTE options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:e0:67:22:fc:52 inet6 fe80::2e0:67ff:fe22:fc52%igb2 prefixlen 64 scopeid 0x3 inet6 2607:fb90:5120:e987:2e0:67ff:fe22:fc52 prefixlen 64 detached deprecated autoconf inet6 2607:fb90:1b7d:9992:2e0:67ff:fe22:fc52 prefixlen 64 autoconf inet6 2607:fb90:1b7d:9992:24dc:4e2:d0b6:df11 prefixlen 128 inet 192.168.12.245 netmask 0xffffff00 broadcast 192.168.12.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
In order to recover IPv6 I needed to release and renew the interface, at which point it looked like this:
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1420 description: LTE options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:e0:67:22:fc:52 inet6 fe80::2e0:67ff:fe22:fc52%igb2 prefixlen 64 scopeid 0x3 inet6 2607:fb90:1b7d:9992:2e0:67ff:fe22:fc52 prefixlen 64 autoconf inet6 2607:fb90:1b7d:9992:24dc:4e2:d0b6:df11 prefixlen 128 inet 192.168.12.245 netmask 0xffffff00 broadcast 192.168.12.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
I hadn't looked at ifconfig before and was surprised to see two addresses. pfSense uses the first one.
I've backed out the patch.
Updated by David Myers almost 3 years ago
I neglected to mention that I was using "Disable Gateway Monitoring Action" on my gateways when the above issues occurred. Perhaps that isn't relevant. In any event, I've re-enabled actions because I'm now using a Gateway Group.
Updated by → luckman212 over 2 years ago
David Myers I believe I'm facing this exact issue, take a look at https://forum.netgate.com/topic/172849/rtsold-not-running-ipv6-wan-dhcp-keeps-losing-connectivity/
I'm highly motivated to try to get this fixed!
Updated by → luckman212 over 2 years ago
It appears we are out of luck on having devd
fire events for IP address changes. There is a commit: https://reviews.freebsd.org/rGa75819461ec7c7d8468498362f9104637ff7c9e9 but that seems like it might not even be in FreeBSD13 (much less 12.3...). Maybe we can get this backported to 12.x somehow? Seems like it would be immensely useful in these cases.
My poor man's solution is hashing the output of ifconfig
on all DHCP6 interfaces with cron and firing an event if the hash changes. But this places unnecessary load on the system and also has a potential for up to 1 minute of downtime before the change is picked up. Still, it's the best I can come up with for now until devd can trigger on address changes. I have a PR for this that I will submit later today.
Updated by → luckman212 over 2 years ago
Just updated PR #4595 with the new mitigation changes. Testers & feedback wanted.
Updated by → luckman212 over 2 years ago
I posted on the PR that since Reid Linnemann has just deprecated pfSense_getall_interface_addresses(), this should probably be updated to use the new pfSense_get_ifaddrs() function, where possible.
However, I do want to note for the record that my patch has been running for over a week now and has 100% fixed my issue.
Updated by Jim Pingle over 2 years ago
- Status changed from New to Pull Request Review
- Target version set to 2.7.0
- Plus Target Version set to 22.09
Updated by Jim Pingle over 2 years ago
- Plus Target Version changed from 22.09 to 22.11
Updated by Jim Pingle about 2 years ago
- Plus Target Version changed from 22.11 to 23.01
Updated by Jim Pingle about 2 years ago
- Status changed from Pull Request Review to Feedback
This needs re-tested since snapshots are on FreeBSD 14-CURRENT (main) now the change noted above is in the tree. I checked and the relevant commit is present in the branch(es) used to build dev snapshots.
If it still requires changes to the pfSense source we'll need an updated PR and to move this ahead to 23.05 since the current PR does not apply, and additional changes will need more time than we have for the 23.01 release.
Updated by Jim Pingle about 2 years ago
- Plus Target Version changed from 23.01 to 23.05
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 23.05 to 23.09
Still waiting on feedback from someone who can reproduce this to test against a 2.7.0 snap, 23.01 release, or a 23.05 snap.
Updated by David Myers over 1 year ago
Jim Pingle wrote in #note-15:
Still waiting on feedback from someone who can reproduce this to test against a 2.7.0 snap, 23.01 release, or a 23.05 snap.
I'm pretty sure this is still happening on 23.01. With T-Mobile Home Internet I can go weeks without the IPv6 address changing, or it can change several times in one day, so I can't reproduce it on demand.
I've written a simple-minded script to detect and correct the problem and run it from cron every 5 minutes, and there's evidence in the logs of it taking action a few times.
#!/usr/bin/env perl
#
# Delete deprecated IPv6 addresses.
#
use v5.30;
my $if = 'igb2';
my $deleted = 0;
open(my $ifconfig, "-|", "/sbin/ifconfig $if inet6") || die "Can't run /sbin/ifconfig: $!";
while (<$ifconfig>) {
if (/inet6 ([0-9a-fA-F.:]+) prefixlen (\d+) .*deprecated/) {
system("/sbin/ifconfig $if inet6 $1/$2 delete");
$deleted++;
}
}
close($ifconfig);
if ($deleted) {
system("/etc/rc.newwanipv6 $if");
}
In fact I think this was the script firing off early this morning. I've edited out messages from NUT:
Apr 17 01:21:49 router rc.gateway_alarm[72711]: >>> Gateway alarm: TMHI_DHCP6 (Addr:2620:fe::10 Alarm:1 RTT:53.447ms RTTsd:9.178ms Loss:41%) Apr 17 01:21:49 router check_reload_status[401]: updating dyndns TMHI_DHCP6 Apr 17 01:21:49 router check_reload_status[401]: Restarting IPsec tunnels Apr 17 01:21:49 router check_reload_status[401]: Restarting OpenVPN tunnels/interfaces Apr 17 01:21:49 router check_reload_status[401]: Reloading filter Apr 17 01:21:51 router php-fpm[65938]: /rc.filter_configure_sync: MONITOR: TMHI_DHCP6 has packet loss, omitting from routing group TMHI_FAILOVER6 Apr 17 01:21:51 router php-fpm[65938]: 2620:fe::10|2607:fb90:7522:c4a9:a484:c093:675:c336|TMHI_DHCP6|52.851ms|8.805ms|44%|down|highloss Apr 17 01:25:00 router php-cgi[83652]: rc.newwanipv6: rc.newwanipv6: Info: starting on igb2. Apr 17 01:25:00 router php-cgi[83652]: rc.newwanipv6: rc.newwanipv6: on (IP address: 2607:fb90:759b:d695:2e0:67ff:fe26:47ea) (interface: opt1) (real interface: igb2). Apr 17 01:25:01 router php-cgi[83652]: rc.newwanipv6: Removing static route for monitor 2620:fe::fe:10 and adding a new route through fdfd:10:167:100::1 Apr 17 01:25:01 router php-cgi[83652]: rc.newwanipv6: Removing static route for monitor 9.9.9.10 and adding a new route through 192.168.12.1 Apr 17 01:25:01 router php-cgi[83652]: rc.newwanipv6: Removing static route for monitor 2620:fe::10 and adding a new route through fe80::c6e5:32ff:fed7:634e%igb2 Apr 17 01:25:01 router php-cgi[83652]: rc.newwanipv6: Removing static route for monitor 149.112.112.10 and adding a new route through 10.155.98.1 Apr 17 01:25:02 router check_reload_status[401]: Reloading filter Apr 17 01:25:02 router php-cgi[83652]: rc.newwanipv6: The command '/sbin/ifconfig igb2 inet6 2607:fb90:7522:c4a9:2e0:67ff:fe26:47ea delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address' Apr 17 01:25:02 router php-cgi[83652]: rc.newwanipv6: Resyncing OpenVPN instances for interface TMHI. Apr 17 01:25:02 router php-cgi[83652]: rc.newwanipv6: Creating rrd update script Apr 17 01:25:02 router php-cgi[83652]: rc.newwanipv6: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 2607:fb90:7522:c4a9:2e0:67ff:fe26:47ea -> 2607:fb90:759b:d695:2e0:67ff:fe26:47ea - Restarting packages. Apr 17 01:25:02 router check_reload_status[401]: Starting packages Apr 17 01:25:02 router check_reload_status[401]: Reloading filter Apr 17 01:25:03 router php-fpm[79152]: /rc.start_packages: Restarting/Starting all packages. Apr 17 01:25:03 router php-fpm[79152]: /rc.start_packages: Stopping service nut Apr 17 01:25:03 router php-fpm[79152]: /rc.start_packages: Starting service nut Apr 17 01:25:03 router php-fpm[7842]: /rc.filter_configure_sync: MONITOR: TMHI_DHCP6 is available now, adding to routing group TMHI_FAILOVER6 Apr 17 01:25:03 router php-fpm[7842]: 2620:fe::10|2607:fb90:759b:d695:2e0:67ff:fe26:47ea|TMHI_DHCP6|43.574ms|1.428ms|0.0%|online|none
Updated by Jim Pingle over 1 year ago
- Status changed from Feedback to New
- Target version changed from 2.7.0 to CE-Next
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 23.09 to 24.01
Updated by Marcos M over 1 year ago
- Status changed from New to Feedback
I tested this in 23.09 dev snapshots and am not able to reproduce the issue.
The following are logs from a lease change and a manual release/renew - of note is an additional IPv6 autoconf address that gets added:
After stopping the DHCP6 server (including RAs), the lease expires and the interface address is marked as deprecated. After starting DHCP6/RAs again, pfSense picks up the change.
Given the above, it looks like the original issue as described has been resolved - additional feedback is welcomed.
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 24.01 to 24.03
Updated by Marcos M about 1 year ago
- Related to Regression #11570: Gateway monitoring services is not always restarted on interface events, which may prevent a WAN from recovering back to an online state added
Updated by Jim Pingle 3 months ago
- Target version changed from CE-Next to 2.8.0