Project

General

Profile

Actions

Bug #11454

closed

Gateway value for DHCP6 interfaces missing after RA events triggered script without gateway information

Added by Mike McV almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Interfaces
Target version:
Start date:
02/18/2021
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.5.0
Affected Architecture:
amd64

Description

Post update to 2.5.0 dpinger is not functioning for IPv6 gateway monitoring

Wan interface set to DHCPv6, WAN Interface Client set to use IPv6 for DHCP. Learn address and delegation. Delegation request of /60 with hint, and do not wait for RA.

IPV6 Routing table for default.

default fe80::2ca:e5ff:fec9:f022%lagg0.666

IP link gateway reachable.

PING6 fe80::a236:9fff:fe21:a5a4%lagg0.666 --> fe80::2ca:e5ff:fec9:f022
16 bytes from fe80::2ca:e5ff:fec9:f022%lagg0.666, icmp_seq=0 hlim=64 time=12.498 ms
16 bytes from fe80::2ca:e5ff:fec9:f022%lagg0.666, icmp_seq=1 hlim=64 time=12.321 ms
16 bytes from fe80::2ca:e5ff:fec9:f022%lagg0.666, icmp_seq=2 hlim=64 time=12.981 ms

--- fe80::2ca:e5ff:fec9:f022 ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 12.321/12.600/12.981/0.279 ms

This config works fine on 2.4.5-p1 (Tested with a reinstall of 2.4.5-p1 and restore config, and fails again after upgrade to 2.5.0)


Files

rtsol.c.patch (536 Bytes) rtsol.c.patch Tim Dunn, 02/28/2021 02:34 PM
rtsol.c.patch (536 Bytes) rtsol.c.patch Tim Dunn, 02/28/2021 02:52 PM
interfaces.inc.diff (883 Bytes) interfaces.inc.diff Greg Shaffer, 02/28/2021 11:23 PM
Screen Shot 2021-03-16 at 10.07.29 AM.png (217 KB) Screen Shot 2021-03-16 at 10.07.29 AM.png WAN DHCP6 Config Greg Shaffer, 03/16/2021 12:28 PM
Actions #1

Updated by Anonymous almost 4 years ago

"Me too"... After upgrading to 2.5.0, IPv6 did not work until I manually added an address for monitoring. After doing that, the gateway shows as "dynamic" on the Gateways page, and the dashboard widget doesn't show an IPv6 address for that gateway, but routing is working. Without specifying a monitoring address, the status on the dashboard widget showed as "Pending".

Forum thread: https://forum.netgate.com/topic/160952/ipv6-no-gateway-after-2-5-upgrade

Actions #2

Updated by Hayden Hill almost 4 years ago

I am having this issue as well. Starting with 2.5. Without manually overriding gateway monitoring for the ipv6 gateway PfSense will fail to connect to IPV6 DNS Servers.

Actions #3

Updated by Car F almost 4 years ago

Same here after update from 2.4.5_1 to 2.5.0. IPv6 is working but Gateway only shows "~" and there is no IPv6 Gateway address.

Actions #4

Updated by Viktor Gurov almost 4 years ago

something wrong with /var/etc/rtsold_{realif}_script.sh -
it saves empty /tmp/{realif}_routerv6 and /tmp/{realif}_defaultgwv6
empty route in logs:

Feb 18 12:06:22 pf41 rtsold[70362]: Received RA specifying route  for interface opt1(vtnet2)

2.4.5-p1 good:

Feb 19 16:40:37 pg52 rtsold: Received RA specifying route fe80::2ce8:5ff:fee3:e415 for interface opt1(vtnet2)

Actions #5

Updated by Anonymous almost 4 years ago

As noted in the thread now, this also affects firewall rules that make use of the Gateway option. Because the IPv6 gateway isn't known (my pfSense system only shows my IPv4 gateway), the rules are now broken. I don't know if that might warrant a boost in priority or not, but the scope of the effects of this seem to be growing.

Also an additional thread...
https://forum.netgate.com/topic/160750/fq_codel-ipv6-floating-rule-error

Actions #6

Updated by Pete C almost 4 years ago

Same issues as noted above.

I was able to get IP6 working after configuring IP6 gateway monitoring to IP6 address on interface.

Actions #7

Updated by Nick B over 3 years ago

I'm also having the same problem. Manually setting the monitor address to the link-local address has worked around the issue however, as noted above this breaks CoDel rules.

Actions #8

Updated by Steve Y over 3 years ago

Michael Virgilio wrote:

but routing is working. Without specifying a monitoring address, the status on the dashboard widget showed as "Pending".

After upgrading an SG-2100 to 21.02 I also see the Pending in Status/Gateways but can ping ipv6.google.com from my PC behind it. So at least the GUI part is not just in 2.5. Status/Interfaces does not list an IPv6 gateway:

Gateway IPv4 76.217.***
IPv6 Link Local fe80::2e0:edff:febe:d24d%mvneta0
IPv6 Address 2600:1700:***
Subnet mask IPv6 64
MTU 1500

Diagnostics/Routes shows a default route to "fe80::8a96:4eff:fedd:1e70%mvneta0".

My pfSense is behind an AT&T router/modem.

Actions #9

Updated by Tim Dunn over 3 years ago

Actions #10

Updated by Tim Dunn over 3 years ago

If ManagedConfigFlag is set in rtsold, managedconf_script (-M) will execute instead of otherconf_script (-O)

pfsense previously patched rtsol.c (https://github.com/pfsense/FreeBSD-src/commit/da4e972b334692e031d1f7deea7e4db02d1c0fdc) to add received RA as second argument to script.
It looks like the solution is either to revert upstream patch that conflicts or to modify arguments to managedconf_script as in attached patch

Actions #11

Updated by Greg Shaffer over 3 years ago

I noticed that both /tmp/em0_routerv6 and /tmp/em0_defaultgwv6 were empty while the ipv4 versions had the valid router addresses in them. Both of these are set in the script /var/etc/rtsold_em0_script.sh which is built by the script /etc/inc/interfaces.inc. It looks like the parameter is not being passed to the script. I modified the routine in interfaces.inc that builds the resold_em0_script (search for "rtsoldscript") to set a hard coded value for both these files and my IPv6 gateway started working! Routing, firewall rules, Policy Based Routing, etc, all work!

My mods:

#echo $2 > /tmp/em0_routerv6
echo "fe80::X:X:X:X" > /tmp/em0_routerv6
#echo $2 > /tmp/em0_defaultgwv6
echo "fe80::X:X:X:X" > /tmp/em0_defaultgwv6

After I made the mods, I "saved" the WAN interface again without making any changes. Hope this helps someone until a real fix is pushed out.

Actions #12

Updated by Greg Shaffer over 3 years ago

UPDATE:

Here is a diff of my changes to /etc/inc/interfaces.inc

Actions #13

Updated by Car F over 3 years ago

Thank you @Greg Schaffer, that worked for me!

Actions #14

Updated by Anonymous over 3 years ago

Greg Shaffer wrote:

#echo $2 > /tmp/em0_routerv6
echo "fe80::X:X:X:X" > /tmp/em0_routerv6
#echo $2 > /tmp/em0_defaultgwv6
echo "fe80::X:X:X:X" > /tmp/em0_defaultgwv6

I just manually created the files /tmp/*_defaultgwv6 and /tmp/*_routerv6 with the valid gateway/router IPv6 address as file content. A few seconds later, IPv6 routing and firewall rules finally started working again. Thank you.

Actions #15

Updated by Pete C over 3 years ago

Greg Shaffer wrote:

UPDATE:

Here is a diff of my changes to /etc/inc/interfaces.inc

Thank you Greg.

This worked for me and I changed the monitoring IP6 IP back to Google DNS the way it was in PFSense V2.4.x

My steps here were:

1 - looked up the local link gateway address via PFSense / diagnostics / routes / IP6 routes (FE80...)
2 - did a copy and paste of the address minus %em1 in to the diff file.
3 - rebooted PFSense
4 - put in the Google IP6 DNS address as the IP6 Gateway monitoring address

All is well now for time being.

Actions #16

Updated by Greg Shaffer over 3 years ago

Dennis P wrote:

Greg Shaffer wrote:

#echo $2 > /tmp/em0_routerv6
echo "fe80::X:X:X:X" > /tmp/em0_routerv6
#echo $2 > /tmp/em0_defaultgwv6
echo "fe80::X:X:X:X" > /tmp/em0_defaultgwv6

I just manually created the files /tmp/*_defaultgwv6 and /tmp/*_routerv6 with the valid gateway/router IPv6 address as file content. A few seconds later, IPv6 routing and firewall rules finally started working again. Thank you.

I believe both of these files will be rewritten if you make a change to your WAN or you reboot your firewall.

Actions #17

Updated by Anonymous over 3 years ago

Greg Shaffer wrote:

I believe both of these files will be rewritten if you make a change to your WAN or you reboot your firewall.

That's true. It's just my personal workaround until the bug is fixed.

Actions #18

Updated by Mike Loiterman over 3 years ago

Dennis P wrote:

Greg Shaffer wrote:

I believe both of these files will be rewritten if you make a change to your WAN or you reboot your firewall.

That's true. It's just my personal workaround until the bug is fixed.

When you reference /tmp/*_defaultgwv6 and /tmp/*_routerv6, or you actually creating a file called *_routerv6 or are you creating, for example, ix0_routerv6 and ix0_defaultgwv6?

Actions #19

Updated by Anonymous over 3 years ago

Mike Loiterman wrote:

When you reference /tmp/*_defaultgwv6 and /tmp/*_routerv6, or you actually creating a file called *_routerv6 or are you creating, for example, ix0_routerv6 and ix0_defaultgwv6?

The latter, of course. In my case, * is vtnet0.

Actions #20

Updated by Eric B over 3 years ago

Greg Shaffer wrote:

I noticed that both /tmp/em0_routerv6 and /tmp/em0_defaultgwv6 were empty while the ipv4 versions had the valid router addresses in them. Both of these are set in the script /var/etc/rtsold_em0_script.sh which is built by the script /etc/inc/interfaces.inc. It looks like the parameter is not being passed to the script. I modified the routine in interfaces.inc that builds the resold_em0_script (search for "rtsoldscript") to set a hard coded value for both these files and my IPv6 gateway started working! Routing, firewall rules, Policy Based Routing, etc, all work!

My mods:

#echo $2 > /tmp/em0_routerv6
echo "fe80::X:X:X:X" > /tmp/em0_routerv6
#echo $2 > /tmp/em0_defaultgwv6
echo "fe80::X:X:X:X" > /tmp/em0_defaultgwv6

After I made the mods, I "saved" the WAN interface again without making any changes. Hope this helps someone until a real fix is pushed out.

Thank you Greg - I had noticed that the v6 files under /tmp were also empty but didn't understand exactly what supposed to be in them. Plus wasn't aware what created the rtsold_X__script.sh under /var/etc until your post. I haven't decided to make any changes to any base scripts but found a quick command to pull in the default ipv6 gw to echo into those files using:
netstat -rn6 | grep default | awk '{print $2}' | cut -f1 -d%

It's a bit messy but will work in the short term.

Again, Thank you!

Actions #21

Updated by Viktor Gurov over 3 years ago

see also #11187

Actions #22

Updated by Greg Shaffer over 3 years ago

Victor- Any idea when this is going to get some attention? This issue really ripples thru out the system (e.g. Gateway Monitoring, Track Interfaces, Routing, Firewall, Policy Based Routing, Limiters, etc). I've had to remove/disable a significant amount of functionality that just worked in 2.4.5-p1. IPv6 via DHCP6 is broken in both 2.5.0 and 21.02-p1.

Actions #23

Updated by Paul K over 3 years ago

I can confirm this as an issue.

This is however much larger issue than described in the original post. This should really be classified as a show stopper and fixed ASAP. After updating to 2.5 I lost internet access from all internal networks and had to waste a lot of time figuring out what exactly was causing it, so overall my upgrade experience has not been positive this time.

Viktor Gurov and Tim Dunn already explained exactly where the issue lies, but just to reiterate.

  • We have a WAN interface that is configured to use DHCP6 with 'Do not wait for a RA' flag disabled
  • interface_dhcpv6_configure() gets called and generates /var/etc/dhcp6c_wan.conf
  • /var/etc/dhcp6c_wan.conf executes /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh
  • /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh executes rtsold instructing it to execute /var/etc/rtsold_igb0_script.sh
  • /var/etc/rtsold_igb0_script.sh grabs second command line argument and dumps it into /tmp/igb0_routerv6 and /tmp/igb0_defaultgwv6; problem is there is no second argument because rtsol requires custom patch to pass it so we end up with empty files.
  • when get_interface_gateway_v6("wan") is called it returns empty value instead of IPv6 gateway address, because it's looking things up from empty files
  • when return_gateways_array() is called 'gateway' is set to "dynamic" instead of IPv6 gateway address
                [WAN_DHCP6] => Array
                    (
                        [interface] => igb0
                        [gateway] => dynamic                  <-- this is a problem
                        [name] => WAN_DHCP6
                        [weight] => 1
                        [ipprotocol] => inet6
                        [descr] => Interface WAN_DHCP6 Gateway
                        [monitor] => 20xx:xxxx:xxxx::8888
                        [dynamic] => 1
                        [friendlyiface] => wan
                        [friendlyifdescr] => WAN
                        [attribute] => 1
                        [tiername] => Default (IPv6)
                    )
    
  • from this point any code that requires IPv6 gateway is broken that includes gateway monitoring, firewall rules that use gateway, snort package, etc.
  • /tmp/rules.debug gets generated as below
    # Gateways
    GWWAN_DHCP = " route-to ( igb0 111.222.333.444 ) " 
    GWWAN_DHCP6 = " route-to ( igb0 111.222.333.444 ) "                 <-- this is wrong, should be IPv6 gateway
    
    pass  out  quick  on {  igb0  } inet6 from any  to <negate_networks> tracker 10000002 keep state  dnqueue( 2,1)  label "NEGATE_ROUTE: Negate policy routing for destination" 
    pass  out  quick  on {  igb0  }  $GWWAN_DHCP6 inet6 from any to any tracker 1548386483 keep state  dnqueue( 2,1)  label "USER_RULE: v6-CoDel Limiters" 
    

    which of course throws errors during filter reload and generates alert

All in all this leaves system in unusable state after the upgrade.

Actions #24

Updated by Car F over 3 years ago

Paul K wrote:

I can confirm this as an issue.

This is however much larger issue than described in the original post. This should really be classified as a show stopper and fixed ASAP. After updating to 2.5 I lost internet access from all internal networks and had to waste a lot of time figuring out what exactly was causing it, so overall my upgrade experience has not been positive this time.

Couldn't agree more. This is unacceptable for a "stable" release. In Germany there is heavy usage of IPv6 only connections so that you can't use them anymore. Further more if you update remoteley you're locked out. I'm surprised that there is no word from the devs to this.

I'm still using the workaround from Greg that seems to work fine.

Actions #25

Updated by Jim Pingle over 3 years ago

  • Assignee set to Jim Pingle
  • Target version set to 2.5.1

To me, I have a fix.

Actions #26

Updated by Jim Pingle over 3 years ago

  • Subject changed from dpinger not updating/learning IPv6 gateway to Gateway value for DHCP6 interfaces lost after RA events without gateway information, breaks monitoring
  • Category changed from Gateway Monitoring to Interfaces

Updating subject for release notes and to more accurately reflect the nature of the problem.

Actions #27

Updated by Jim Pingle over 3 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100
Actions #28

Updated by Anonymous over 3 years ago

I tried applying this as a patch to my 2.5 box... the patch tested properly and applied without issue, but after removing the manually set monitoring IP I've had set, the WAN_DHCP6 gateway is at "dynamic" for both the gateway and monitor address, and reverted to "Pending" status when I went back to my dashboard. The IPv6 gateway also still doesn't show in rules.

I tried releasing/renewing DHCP on my WAN interface, but no change.
I tried re-saving/applying the interface settings, but no change as well.

Are there other things that need to be done for this to update? Is it something that will only properly work with beta snapshots (something I'm trying to avoid)?

Actions #29

Updated by Jim Pingle over 3 years ago

At a minimum you have to Edit/Save/Apply on the affected WAN interface after changing the gateway, otherwise it won't rewrite the RA script with the new code. A reboot would also work.

It shouldn't depend on any other changes, but I haven't tested it on a stock 2.5.0 to say that with 100% certainty.

Actions #30

Updated by Greg Shaffer over 3 years ago

I restored the original interfaces.inc, applied the patch and rebooted my system. Doesn't look like it fixed the issue. My IPv6 gateway still shows as dynamic with "pending" for RTT, RTTsd and Loss. I'm getting alerts that the firewall rules won't load because "no routing address with matching address family found" for the IPv6 CoDel Limiter I have setup.

Actions #32

Updated by Jim Pingle over 3 years ago

Please direct all feedback to threads on the forum category for Plus 21.02.2 / CE 2.5.1 at https://forum.netgate.com/category/83/21-02-2-2-5-1-snapshots

But please upgrade to a snapshot first to test the change completely. This fix is present in the newly published RC build.

Actions #33

Updated by Greg Shaffer over 3 years ago

The 2.5.1-RC did not resolve the gateway issue. Thread started on the forum.

Actions #34

Updated by Flole Systems over 3 years ago

No surprise that didn't fix it, where should that second argument be coming from? Its never passed to the managedconf_script as already described above. So it is not lost after RA events without gateway information but rtsol is simply not passing it to the managedconf_script.

Actions #35

Updated by Jim Pingle over 3 years ago

  • Status changed from Feedback to In Progress

Flole Systems wrote:

No surprise that didn't fix it, where should that second argument be coming from? Its never passed to the managedconf_script as already described above. So it is not lost after RA events without gateway information but rtsol is simply not passing it to the managedconf_script.

On a half dozen systems I got the gateway in the first RA response and then later responses had no gateway. So the gateway was clobbered on the second and later responses leading to identical symptoms. It works for me since now on ALL of my DHCP6 lab systems, most of which didn't have working gateway monitoring before, work now. Some never broke before or after this change.

It's the only scenario I've been able to reproduce here locally so far. Other environments must be hitting a different problem than I am.

Actions #36

Updated by Paul K over 3 years ago

I think I might have found the problem.

First of all, I stated incorrectly in my previous post that "/var/etc/dhcp6c_wan_dhcp6withoutra_script.sh" gets executed on my system. The correct sequence is:
  • rtsold is executed from interfaces.inc
  • which in turn executes rtsold_{$wanif}_script.sh
  • which then executes dhcp6c

I applied Jim's patch on my system 2.5.0 (not 2.5.1 RC), manually removed igb0_routerv6/igb0_defaultgwv6 files and ran an update on WAN interface.

Updated /var/etc/rtsold_igb0_script.sh was generated as per patch:

#!/bin/sh
# This shell script launches dhcp6c and configured gateways for this interface.
if [ -n "$2" ]; then
        echo $2 > /tmp/igb0_routerv6
        echo $2 > /tmp/igb0_defaultgwv6
        /usr/bin/logger -t rtsold "Received RA specifying route $2 for interface wan(igb0)" 
fi
if [ ! -f /tmp/dhcp6c_igb0_lock ]; then
        /usr/bin/touch /tmp/dhcp6c_igb0_lock
        if [ -f /var/run/dhcp6c_igb0.pid ]; then
                /bin/pkill -F /var/run/dhcp6c_igb0.pid
                /bin/rm -f /var/run/dhcp6c_igb0.pid
                /bin/sleep 1
        fi
        /usr/local/sbin/dhcp6c -D  -c /var/etc/dhcp6c_wan.conf -p /var/run/dhcp6c_igb0.pid igb0
        /usr/bin/logger -t rtsold "Starting dhcp6 client for interface wan(igb0)" 
else
        /usr/bin/logger -t rtsold "RTSOLD Lock in place - sending SIGHUP to dhcp6c" 
        dhcp6c_pid=$(cat "/var/run/dhcp6c_igb0.pid")
        /bin/kill -1 ${dhcp6c_pid}
fi

However igb0_routerv6/igb0_defaultgwv6 files were not created as well as "Received RA specifying route $2 for interface wan(igb0)" was not logged in the system logs, so we know that conditional code was not executed and that there was no second argument passed to the script.

Here is where I think the problem lies:

in 2.4.5 code that executed rtsold looked like this:

        mwexec("/usr/sbin/rtsold -1 " .
            "-p {$g['varrun_path']}/rtsold_{$wanif}.pid " .
            "-O {$g['varetc_path']}/rtsold_{$wanif}_script.sh " .
            $wanif);

in 2.5.0 it looks like this:

        mwexec("/usr/sbin/rtsold -1 " .
            "-p {$g['varrun_path']}/rtsold_{$wanif}.pid " .
            "-M {$g['varetc_path']}/rtsold_{$wanif}_script.sh " .
            "-O {$g['varetc_path']}/rtsold_{$wanif}_script.sh " .
            $wanif);

We can see that -M script was added for managed config. Looking at the source of rtsol (usr.sbin/rtsold/rtsol.c) we see that version used in 2.4.5 did not yet have support for the -M flag and current version used for 2.5.0 does support it.

So I ran rtsold manually with debugging turned on to see what I can find out (this is on 2.5.0):

# /usr/sbin/rtsold -f1 -d -D -p /var/run/rtsold_igb0.pid -M /var/etc/rtsold_igb0_script.sh -O /var/etc/rtsold_igb0_script.sh igb0*

rtsold: checking if igb0 is ready...
rtsold: igb0 is ready
rtsold: set timer for igb0 to 0s
rtsold: New timer is 0s
rtsold: timer expiration on igb0, state = 1
rtsold: set timer for igb0 to 4s
rtsold: New timer is 4s
rtsold: received RA from fe80::201:aaa:bbbb:cc on igb0, state is 2
rtsold: ManagedConfigFlag on igb0 is turned on
rtsold: script "/var/etc/rtsold_igb0_script.sh" status 0                            <--
rtsold: OtherConfigFlag on igb0 is turned on
rtsold: Processing RA
rtsold: ndo = 0x7fffffffe220
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe240
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe260
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe280
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: rsid = [igb0:slaac]
rtsold: stop timer for igb0
rtsold: there is no timer

we see that managed config script is being executed and other config script is not being excuted, makes sense. What if we run it without -M flag

/usr/sbin/rtsold -f1 -d -D -p /var/run/rtsold_igb0.pid -O /var/etc/rtsold_igb0_script.sh igb0
rtsold: checking if igb0 is ready...
rtsold: igb0 is ready
rtsold: set timer for igb0 to 1s
rtsold: New timer is 1s
rtsold: timer expiration on igb0, state = 1
rtsold: set timer for igb0 to 4s
rtsold: New timer is 4s
rtsold: received RA from fe80::201:aaa:bbbb:cc on igb0, state is 2
rtsold: ManagedConfigFlag on igb0 is turned on
rtsold: OtherConfigFlag on igb0 is turned on
rtsold: Processing RA
rtsold: ndo = 0x7fffffffe250
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe270
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe290
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe2b0
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: rsid = [igb0:slaac]
rtsold: stop timer for igb0
rtsold: there is no timer

other config script is not being executed, hmmm. Looking at rtsol.c line 325 - it does not execute other script if ManagedConfigFlag is set in the response, even if -M is not specified. So basically it never executes other script if response comes back with managed flag.

So what I guess is happening is that people having problems receive RA response with ManagedConfigFlag turned on and Jim does not have ManagedConfigFlag turned on in his testing lab so for him rtsold executes other script instead of managed script. rtsold is patched to pass second argument to the other script (-O), but not to managed script (-M).

#define    _ARGS_MANAGED    managedconf_script, ifi->ifname              <-- managed script, single argument
#define    _ARGS_OTHER    otherconf_script, ifi->ifname, ntopbuf       <-- other script, two arguments
Actions #37

Updated by Flole Systems over 3 years ago

Exactly, and that was already described above. That's why I was wondering how this patch was supposed to fix it when all it does is ignoring everything if no second argument is present instead of adding that second argument.

Also in line 5091 of the interfaces.inc the -M flag is missing entirely, I think it should be added there aswell but I'm not sure about that.

Actions #38

Updated by Paul K over 3 years ago

Yeah, I guess it was already described. The way I read that post though is that it was patched to pass second argument before, but is no longer patched to do that and not that it still works with -O script, but not with -M script.

Actions #39

Updated by Jim Pingle over 3 years ago

  • % Done changed from 100 to 50

OK, so I did some sniffing and found that the systems I was observing had multiple devices on the segment responding to RA requests and though I thought it was pfSense (which is set to Managed+Other) responding but it was in fact another router which didn't have those flags. When I killed that and did the test above, I now get the same behavior others see.

So my fix did correct an issue seen in some networks, but not this main problem. We'll see what we can come up with for this shortly.

Actions #40

Updated by Greg Shaffer over 3 years ago

Running rtsold manually, as Paul K (Thanks!) did, I see the same results.

Actions #41

Updated by Jim Pingle over 3 years ago

  • Assignee changed from Jim Pingle to Renato Botelho

OK I've tested with a patched rtsold on multiple systems and now I'm seeing the correct and expected behavior all around with M+O RA messages. We'll get that into builds shortly.

Thanks for all the info!

Actions #42

Updated by Renato Botelho over 3 years ago

  • Status changed from In Progress to Feedback

I've pushed rtsold fix to FreeBSD-src repository for all branches. It should be fine on next snapshot.

Actions #43

Updated by Jim Pingle over 3 years ago

  • Subject changed from Gateway value for DHCP6 interfaces lost after RA events without gateway information, breaks monitoring to Gateway value for DHCP6 interfaces missing after RA events triggered script without gateway information

Adjusting subject again to reflect both problems that were fixed since they were close, potentially related, but not identical

Actions #44

Updated by Jim Pingle over 3 years ago

  • % Done changed from 50 to 100

The complete set of fixes is in the current RC build, so it's ready for others to test. It works for me that's me and my environment so I'd like some more feedback.

Thanks!

Actions #45

Updated by Patrik Lundquist over 3 years ago

Working for me too now with 2.5.1.r.20210318.0300.

Actions #46

Updated by Jesse Beauclaire over 3 years ago

Hate to ask this here, but I am affected by this issue so it's sort of relevent... Can I update to the RC without killing my ability to stay on the Stable release channel? If yes, how can I grab the RC?
Thanks.

Actions #47

Updated by Jim Pingle over 3 years ago

Jesse Beauclaire wrote:

Hate to ask this here, but I am affected by this issue so it's sort of relevent... Can I update to the RC without killing my ability to stay on the Stable release channel? If yes, how can I grab the RC?

Just pick the RC branch, when it comes time for release it should lead naturally to the release and not future snapshots.

https://www.netgate.com/blog/open-call-for-testing-pfsense-plus-and-ce-release-candidates.html

If you have more questions, follow up on the forum: https://forum.netgate.com/category/83/21-02-2-2-5-1-snapshots

Actions #48

Updated by Mike McV over 3 years ago

This (2.5.1.r.20210318.0300) did not resolve it for me.

If i remove my static IPV6 monitor address Gateway monitoring stops working, but the protocol works and follows system routing.

I currently do not have the file /tmp/lagg0.666_defaultgwv6 (My upstream ISP) so the script is not creating this. I disabled the WAN IF and re enabled, and tried full rip replace of the IPV6 protocol from the system with no change.

Thank you for your help with this.

Info Snips...

2.5.1-RC (amd64)
built on Thu Mar 18 03:04:03 EDT 2021
FreeBSD 12.2-STABLE

WAN_DHCP
73.x.x.x 11.2ms 0.8ms 0.0% Online
WAN_DHCP6~ Pending Pending Pending Unknown

IPv6 Routes
Destination Gateway Flags Use Mtu Netif Expire
default fe80::2ca:e5ff:fec9:f022%lagg0.666 UG 2291 1500 lagg0.666
::1 link#7 UH 817 16384 lo0

C:\Windows\System32>ping ipv6.google.com

Pinging ipv6.l.google.com [2607:f8b0:400a:803::200e] with 32 bytes of data:
Reply from 2607:f8b0:400a:803::200e: time=14ms
Reply from 2607:f8b0:400a:803::200e: time=16ms
Reply from 2607:f8b0:400a:803::200e: time=15ms
Reply from 2607:f8b0:400a:803::200e: time=14ms

Ping statistics for 2607:f8b0:400a:803::200e:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 14ms, Maximum = 16ms, Average = 14ms

Actions #49

Updated by Jim Pingle over 3 years ago

Mike McV wrote:

If i remove my static IPV6 monitor address Gateway monitoring stops working, but the protocol works and follows system routing.

Did you reboot after doing that? Or at least edit/save/apply on the interface set to DHCP6?

WAN_DHCP6~ Pending Pending Pending Unknown
default fe80::2ca:e5ff:fec9:f022%lagg0.666 UG 2291 1500 lagg0.666

That would seem to suggest you are actually getting the gateway, but it's somehow not making it into dpinger. Maybe try clicking the trash can on the WAN_DHCP6 gateway entry on System > Routing, then reboot and see what happens.

I currently do not have the file /tmp/lagg0.666_defaultgwv6 (My upstream ISP) so the script is not creating this. I disabled the WAN IF and re enabled, and tried full rip replace of the IPV6 protocol from the system with no change.

That doesn't make sense otherwise you wouldn't have a default entry in the IPv6 routing table for that interface.

Something else may be going on with yours yet that is different from everyone else. Something else to try would be to manually invoke rtsold as others did in previous comments to see what it outputs and executes.

You should start a fresh thread under https://forum.netgate.com/category/83/21-02-2-2-5-1-snapshots and we can discuss it more in detail there.

Actions #50

Updated by Flole Systems over 3 years ago

Flole Systems wrote:

Also in line 5091 of the interfaces.inc the -M flag is missing entirely, I think it should be added there aswell but I'm not sure about that.

If that's an issue then you might be experiencing something related to this. Check if rtsold is running and if it's missing the -M flag, then you know if this could be the problem or not.

Actions #51

Updated by Greg Shaffer over 3 years ago

2.5.1-RC-20210318-0300 resolved the IPv6 Gateway issue I was experiencing. Thanks for the fix!

Actions #52

Updated by Mike McV over 3 years ago

Jim Pingle wrote:

If i remove my static IPV6 monitor address Gateway monitoring stops working, but the protocol works and follows system routing.

Did you reboot after doing that? Or at least edit/save/apply on the interface set to DHCP6?

Yes

WAN_DHCP6~ Pending Pending Pending Unknown
default fe80::2ca:e5ff:fec9:f022%lagg0.666 UG 2291 1500 lagg0.666

That would seem to suggest you are actually getting the gateway, but it's somehow not making it into dpinger. Maybe try clicking the trash can on the WAN_DHCP6 gateway entry on System > Routing, then reboot and see what happens.

No Change

I currently do not have the file /tmp/lagg0.666_defaultgwv6 (My upstream ISP) so the script is not creating this. I disabled the WAN IF and re enabled, and tried full rip replace of the IPV6 protocol from the system with no change.

That doesn't make sense otherwise you wouldn't have a default entry in the IPv6 routing table for that interface.

Is there a possibility the scripts are not happy with a Tagged LAGG interface.(Outside of my expertise.)

Something else may be going on with yours yet that is different from everyone else. Something else to try would be to manually invoke rtsold as others did in previous comments to see what it outputs and executes.

Still working on this, I am not **ix/bsd fluent, I am a router guy, so working on syntax.

You should start a fresh thread under https://forum.netgate.com/category/83/21-02-2-2-5-1-snapshots and we can discuss it more in detail there.

Will do

Actions #53

Updated by Jim Pingle over 3 years ago

Mike McV wrote:

Is there a possibility the scripts are not happy with a Tagged LAGG interface.(Outside of my expertise.)

No because that's actually one of the scenarios I tested :-)

: cat /tmp/lagg0.4090_routerv6
fe80::208:a2ff:fe09:95b5
: pfSsh.php playback gatewaystatus | grep DHCP6
WAN_DHCP6      fe80::208:a2ff:fe09:95b5%lagg0.4090  694cf5c4              0.161ms  0.066ms  0.0%  online       none
: netstat -rn6 | grep default
default                           fe80::208:a2ff:fe09:95b5%lagg0.4090 UG lagg0.40
Actions #54

Updated by Mike McV over 3 years ago

Got the syntax correct on the rtsold, and running this from the CLI resolves the issue, but it does not survive a reboot.

Re running it after reboot cat will restore GW Monitoring.

/tmp does not have the file lagg0.666_defaultgwv6 until the command is manually run.

Output below.

/usr/sbin/rtsold f1 -d -D -p /var/run/rtsold_lagg0.666.pid -M /var/etc/rtsold_lagg0.666_script.sh -O /var/etc/rtsold_lagg0.666_script.sh lagg0.666
rtsold: checking if lagg0.666 is ready...
rtsold: lagg0.666 is ready
rtsold: set timer for lagg0.666 to 1s
rtsold: New timer is 1s
rtsold: timer expiration on lagg0.666, state = 1
rtsold: set timer for lagg0.666 to 4s
rtsold: New timer is 4s
rtsold: rtmsg type 1, len=240
rtsold: New timer is 4s
rtsold: rtmsg type 1, len=240
rtsold: New timer is 3s
rtsold: received RA from fe80::2ca:e5ff:fec9:f022 on lagg0.666, state is 2
rtsold: ManagedConfigFlag on lagg0.666 is turned on
rtsold: script "/var/etc/rtsold_lagg0.666_script.sh" status 0
rtsold: OtherConfigFlag on lagg0.666 is turned on
rtsold: Processing RA
rtsold: ndo = 0x7fffffffe1d0
rtsold: ndo
>nd_opt_type = 1
rtsold: ndo->nd_opt_len = 1
rtsold: ndo = 0x7fffffffe1d8
rtsold: ndo->nd_opt_type = 5
rtsold: ndo->nd_opt_len = 1
rtsold: ndo = 0x7fffffffe1e0
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe200
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe220
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: ndo = 0x7fffffffe240
rtsold: ndo->nd_opt_type = 3
rtsold: ndo->nd_opt_len = 4
rtsold: rsid = [lagg0.666:slaac]
rtsold: stop timer for lagg0.666
rtsold: there is no timer

Actions #55

Updated by Mike McV over 3 years ago

After quite a bit of digging and capturing i think i have found the missing link to my scenario. I will also create this in the 2.5.1 forum as I just tested again on the 2.4 train and this symptom does not happen there.

Fix/Workaround.
Be sure do not wait for RA is unchecked.

Observations.

If a router solicit and response sequence fully completes before the DHCP6 processes starts all will be fine.

If DHCP6 completes before the solicit response happens but the DHCP6 advertisement does not contain router information in the response (assumption to follow) the processes to listen for RA's to operate GW monitor functions stop running.

My ISP only provides address, delegation, and DNS in the DHCP6 advertisement.

@Jim Pingle, If there is anything you want me to look at before I open the new thread Let me know.

Thank You for the help

Actions #56

Updated by Jim Pingle over 3 years ago

That's probably a bit tougher to replicate then. Like you said that's one for a new forum thread and likely a different Redmine issue once the details are hashed out since this one has seen two other related issues found and fixed already. I wouldn't want to drag this particular Redmine issue out further if it's fixed for all the other scenarios that were broken but the one you're seeing.

Actions #57

Updated by Flole Systems over 3 years ago

I pointed out a possible cause for this 2 times now already and nobody seemed to care, so one last time:

Flole Systems wrote:

Flole Systems wrote:

Also in line 5091 of the interfaces.inc the -M flag is missing entirely, I think it should be added there aswell but I'm not sure about that.

If that's an issue then you might be experiencing something related to this. Check if rtsold is running and if it's missing the -M flag, then you know if this could be the problem or not.

In other words: If you check (using "ps -aux | grep rtsold") and it's missing the option there then this is most likely your problem.

Actions #58

Updated by Paul K over 3 years ago

Tested with the new RC build and it is working fine for me know. Thanks for fixing it Jim and Renato!

Actions #59

Updated by Paul K over 3 years ago

Flole Systems wrote:

I pointed out a possible cause for this 2 times now already and nobody seemed to care, so one last time:

Flole Systems wrote:

Flole Systems wrote:

Also in line 5091 of the interfaces.inc the -M flag is missing entirely, I think it should be added there aswell but I'm not sure about that.

If that's an issue then you might be experiencing something related to this. Check if rtsold is running and if it's missing the -M flag, then you know if this could be the problem or not.

In other words: If you check (using "ps -aux | grep rtsold") and it's missing the option there then this is most likely your problem.

I did look at line 5091 but there was nothing on that line related to rtsold. Anyway, I think you are talking about this line, 5055 in 2.5.0 branch.

$dhcp6cscriptwithoutra .= "/usr/sbin/rtsold -1 -p {$g['varrun_path']}/rtsold_{$wanif}.pid -O {$g['varetc_path']}/rtsold_{$wanif}_script.sh {$wanif}\n";

Jim, since Mike had "Do not wait for a RA" flag checked rtsold in his system was being launched from dhcp6c_wan_dhcp6withoutra_script.sh script (above code line) and not directly executed by PHP script. Once -M parameter is added to the above line it should solve his problem as well. Remember, my test showed that rtsold does not run -O script if RA response has ManagedConfigFlag enabled and there is no -M script provided, and Mike's response does have ManagedConfigFlag enabled.

Actions #60

Updated by Flole Systems over 3 years ago

Paul K wrote:

I did look at line 5091 but there was nothing on that line related to rtsold. Anyway, I think you are talking about this line, 5055 in 2.5.0 branch.

Line 5055 in the 2.5.0 branch is now line 5091 in latest master. So yes, I was talking about that line.

Actions #61

Updated by Jim Pingle over 3 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from Renato Botelho to Jim Pingle

OK I thought it was more subtle than that but you are right, I was able to replicate it by checking that box, and confirming that adding -M there fixed it. I'll push a fix for that momentarily. Thanks!

Actions #62

Updated by Jim Pingle over 3 years ago

  • Status changed from In Progress to Feedback
Actions #63

Updated by Pete C over 3 years ago

Jim Pingle wrote:

Applied in changeset f3488a18e3fc276b58ecc2aeb8f7471da9bd2088.

Will a different patch be available for 2.5.1?

Currently 2.5.1 is working but not showing gateway address.

Actions #64

Updated by Renato Botelho over 3 years ago

Pete C wrote:

Jim Pingle wrote:

Applied in changeset f3488a18e3fc276b58ecc2aeb8f7471da9bd2088.

Will a different patch be available for 2.5.1?

Currently 2.5.1 is working but not showing gateway address.

Same patch was backported to 2.5.1. A new snapshot is building right now.

Actions #65

Updated by Pete C over 3 years ago

Thank you Renato.

f3488a18e3fc276b58ecc2aeb8f7471da9bd2088

Tried the above diff patch on my 2.5.1 build with the RA checkbox thing and it did not change anything.

Actions #66

Updated by Jim Pingle over 3 years ago

Pete C wrote:

Tried the above diff patch on my 2.5.1 build with the RA checkbox thing and it did not change anything.

Start a new thread on the forum at https://forum.netgate.com/category/83/21-02-2-2-5-1-snapshots to diagnose your specific case further. When you post, include all the information from above (your config, log data, what happens when you run the command manually, etc), but don't put that in comments here, put it on a forum post.

Actions #67

Updated by Pete C over 3 years ago

Thank you Jim.

Moderator moved my original upgrade post on the forum to the snapshots section.

Updated to released snapshot and all is well now after updating to new release candidate: 2.5.1.r.20210318.0300

You guys are great!!!

Actions #68

Updated by Jesse Beauclaire over 3 years ago

RC worked great for me! dpinger works, and I could re-enable my traffic limiters (codel) with great success.

Thanks!!!

Actions #69

Updated by Jim Pingle over 3 years ago

I'll leave this open over the weekend to collect more feedback but I think at this point every problem scenario is solved.

Actions #70

Updated by Mike McV over 3 years ago

All is good on my installation ...

Thank you to everyone for the help.

Actions #71

Updated by Renato Botelho over 3 years ago

  • Status changed from Feedback to Resolved

It seems to be resolved now.

Actions

Also available in: Atom PDF