Project

General

Profile

Actions

Bug #12920

open

Gateway behavior differs when the gateway does not exist in the configuration

Added by Marcos M about 2 years ago. Updated 2 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Gateway Monitoring
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default
Affected Version:
2.6.0
Affected Architecture:

Description

The gateway status and dpinger behave differently when the respective gateway entry does not exist in the config.xml file. This behavior difference results in failure to fail back after WAN failover.

Test:
  • DHCP WAN
  • Bounce interface physically and with ifconfig.
  • no gw = no gateway entry in config.xml
  • gw = gateway entry exists in config.xml

Netgate 5100

ifconfig produced same results.

            unplug cable                            plug cable
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw missing             RUNNING             ONLINE              RUNNING
22.01 gw    pending             stopped             pending             stopped

22.05 no gw missing             stopped             ONLINE              RUNNING
22.05 gw    pending             stopped             ONLINE              RUNNING

Netgate 1100

            unplug cable                            plug cable
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw missing             RUNNING             ONLINE              RUNNING
22.01 gw    pending             stopped             pending             stopped

22.05 no gw missing             stopped             ONLINE              RUNNING
22.05 gw    pending             stopped             ONLINE              RUNNING

            ifconfig down                           ifconfig up
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw offline             RUNNING             ONLINE              RUNNING
22.01 gw    offline             RUNNING             ONLINE              RUNNING

22.05 no gw offline             RUNNING             ONLINE              RUNNING
22.05 gw    offline             RUNNING             ONLINE              RUNNING
A missing gateway can have other undesired behavior:
  • The Automatic default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.
  • dpinger will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.

Files

rm12920-5100-igb0-connected.png (79.2 KB) rm12920-5100-igb0-connected.png WAN connected Ryan Coleman, 12/19/2022 11:19 AM
rm12920-5100-igb0-disconnected.png (51.2 KB) rm12920-5100-igb0-disconnected.png WAN disconnected Ryan Coleman, 12/19/2022 11:19 AM

Related issues

Related to Regression #14616: dpinger does not start after renewing DHCPResolvedMarcos M

Actions
Related to Regression #11570: Gateway monitoring services is not always restarted on interface events, which may prevent a WAN from recovering back to an online stateClosed

Actions
Actions #1

Updated by Marcos M about 2 years ago

Some notes:

It shouldn't be an issue for WAN failover on 22.05 given that dpinger starts back up. However, it's unclear if it should stop at all. This may be related to the issues reported here:
https://forum.netgate.com/topic/169949/dpinger-stops-crashes-after-update-to-2-6-0/

Actions #2

Updated by Marcos M about 2 years ago

  • Description updated (diff)
Actions #3

Updated by Marcos M about 2 years ago

  • Subject changed from Gateway stays pending after link-loss recovery when using static routes to Gateway stays pending after link-loss recovery
  • Description updated (diff)
Actions #4

Updated by Marcos M about 2 years ago

  • Description updated (diff)
Actions #5

Updated by Viktor Gurov about 2 years ago

  • Assignee set to Viktor Gurov
  • Target version set to 2.7.0
  • Plus Target Version set to 22.05
  • Affected Version set to 2.6.0
Actions #6

Updated by Jim Pingle about 2 years ago

  • Status changed from New to Pull Request Review
Actions #7

Updated by Viktor Gurov about 2 years ago

  • Status changed from Pull Request Review to Feedback
  • % Done changed from 0 to 100
Actions #8

Updated by Viktor Gurov about 2 years ago

  • Status changed from Feedback to New
Actions #9

Updated by Jim Pingle about 2 years ago

  • Status changed from New to Pull Request Review
Actions #10

Updated by Marcos M about 2 years ago

Tested fixes on current 22.05 snap on an 1100 and 5100.

The gateway status / dpinger behavior is now the same:
Gateway entry in config:
  • interface down: dpinger process missing; gateway status missing
  • interface up: dpinger process running; gateway status online
No gateway entry in config:
  • interface down: dpinger process missing; gateway status missing
  • interface up: dpinger process running; gateway status online

Edit: typo after copy/paste

Actions #11

Updated by Viktor Gurov about 2 years ago

  • Status changed from Pull Request Review to Feedback
Actions #12

Updated by Jim Pingle about 2 years ago

  • Status changed from Feedback to New

With this in place it removes dynamic gateway entries for interfaces such as DHCP entirely when they are down, which is not what we want to happen. They should still be in the list, and have to be for certain things to function properly. I've reverted the change, we can try an alternate approach.

Actions #13

Updated by Jim Pingle about 2 years ago

  • Status changed from New to Feedback
Actions #14

Updated by Jim Pingle about 2 years ago

  • Status changed from Feedback to New
Actions #15

Updated by Marcos M about 2 years ago

  • Subject changed from Gateway stays pending after link-loss recovery to Gateway status behavior differs when the gateway does not exist in config.xml
Actions #16

Updated by Marcos M about 2 years ago

  • Description updated (diff)
Actions #19

Updated by Steve Wheeler about 2 years ago

Seeing what looks top be related whilst testing: https://redmine.pfsense.org/issues/12949

After the WAN interface is re-assigned dpinger is stopped and does not restart.
For example here the WAN is reassigned to igb0:

Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Shutting down Router Advertisment daemon cleanly
Mar 22 14:48:43     check_reload_status     398     rc.newwanip starting igb0
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: calling interface_dhcpv6_configure.
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Accept router advertisements on interface igb0
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Starting DHCP6 client for interfaces igb0 in DHCP6 without RA mode
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Starting rtsold process on wan(igb0)
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip: Info: starting on igb0.
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip: on (IP address: 172.21.16.182) (interface: []) (real interface: igb0).
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip called with empty interface.
Mar 22 14:48:44     check_reload_status     398     Reloading filter
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 172.21.16.182 - Restarting packages.
Mar 22 14:48:44     check_reload_status     398     Starting packages
Mar 22 14:48:45     php-fpm     369     /interfaces_assign.php: Default gateway setting Interface WAN_DHCP Gateway as default.
Mar 22 14:48:45     php-fpm     369     /interfaces_assign.php: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Mar 22 14:48:45     check_reload_status     398     Restarting IPsec tunnels
Mar 22 14:48:45     php-fpm     368     /rc.start_packages: Restarting/Starting all packages.
Mar 22 14:48:48     check_reload_status     398     updating dyndns wan
Mar 22 14:48:48     check_reload_status     398     Reloading filter
Mar 22 14:48:48     php-fpm     369     /interfaces_assign.php: Configuration Change: admin@172.21.16.243 (Local Database): Interfaces assignment settings changed
Mar 22 14:48:48     check_reload_status     398     Syncing firewall
Mar 22 14:48:48     php-fpm     369     /interfaces_assign.php: Creating rrd update script
Mar 22 14:48:48     kernel         arprequest: cannot find matching address 

The gateway log shows:

Mar 22 14:48:01     dpinger     14600     send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.21.16.1 bind_addr 172.21.16.183 identifier "WAN_DHCP " 
Mar 22 14:48:41     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43     dpinger     14600     exiting on signal 15 

Tested:
2.7.0-DEVELOPMENT (amd64)
built on Tue Mar 22 06:20:34 UTC 2022
With the MR679 patch

Actions #20

Updated by Jim Pingle almost 2 years ago

  • Plus Target Version changed from 22.05 to 22.09
Actions #22

Updated by Jim Pingle almost 2 years ago

  • Status changed from New to Pull Request Review
Actions #23

Updated by Marcos M almost 2 years ago

  • Description updated (diff)

Updating original post with results from 22.05 BETA.

Now the gateway returns to online in every case. However, there are still cases in which the gateway is missing which should not happen.

Actions #24

Updated by Marcos M almost 2 years ago

  • Description updated (diff)
Actions #25

Updated by Marcos M almost 2 years ago

  • Subject changed from Gateway status behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in config.xml
  • Description updated (diff)

Updating OP with new symptoms.

Actions #26

Updated by Marcos M almost 2 years ago

  • Description updated (diff)
Actions #27

Updated by Jim Pingle almost 2 years ago

  • Plus Target Version changed from 22.09 to 22.11
Actions #28

Updated by Jim Pingle over 1 year ago

  • Plus Target Version changed from 22.11 to 23.01
Actions #29

Updated by Jim Pingle over 1 year ago

  • Assignee deleted (Viktor Gurov)
Actions #30

Updated by Jim Pingle over 1 year ago

  • Status changed from Pull Request Review to Feedback

The last MR was merged a while ago. If there are still problems here we need a detailed list of incorrect behaviors, what they should be, and how to reproduce them.

Actions #31

Updated by Jim Pingle over 1 year ago

  • Status changed from Feedback to Resolved

Closing for lack of feedback either way here. I haven't noticed any gateway issues like this in a while and I've done quite a bit of testing with gateway events when working on other issues.

Actions #32

Updated by Ryan Coleman over 1 year ago

Marcos M wrote:

The gateway status and dpinger behave differently when the respective gateway entry does not exist in the config.xml file. This behavior difference results in failure to fail back after WAN failover.

Test:
  • DHCP WAN
  • Bounce interface physically and with ifconfig.
  • no gw = no gateway entry in config.xml
  • gw = gateway entry exists in config.xml

Netgate 5100

ifconfig produced same results.
[...]

Netgate 1100

[...]

A missing gateway can have other undesired behavior:
  • The Automatic default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.
  • dpinger will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.

Verified this is the case on 5100 running the 23.01-BETA nightly from 12/17 with a default installation.
23.01-BETA (amd64) built on Sat Dec 17 14:33:51 UTC 2022

Cable connected/disconnected screenshots attached.

Actions #33

Updated by Jim Pingle over 1 year ago

  • Assignee set to Jim Pingle
  • Plus Target Version changed from 23.01 to 23.05

Lets take our time with this and make sure it gets a thorough and proper analysis and correction for the next release. As it is, we're not worse off than we were on the last release at least, and if it affects someone there is a viable workaround: They can edit/save the gateway so it is populated in the config.

Actions #34

Updated by Jim Pingle about 1 year ago

  • Plus Target Version changed from 23.05 to 23.09
Actions #35

Updated by Jim Pingle 10 months ago

  • Target version changed from 2.7.0 to CE-Next
Actions #36

Updated by Marcos M 7 months ago

Actions #37

Updated by Jim Pingle 7 months ago

  • Plus Target Version changed from 23.09 to 24.01
Actions #38

Updated by Jim Pingle 7 months ago

  • Plus Target Version changed from 24.01 to 24.03
Actions #39

Updated by Marcos M 6 months ago

  • Related to Regression #11570: Gateway monitoring services is not always restarted on interface events, which may prevent a WAN from recovering back to an online state added
Actions #40

Updated by Marcos M 3 months ago

  • Status changed from Confirmed to Pull Request Review
  • Assignee changed from Jim Pingle to Marcos M
  • Target version changed from CE-Next to 2.8.0

https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1124

This change makes sure gateways are added to the config.

Actions #41

Updated by Marcos M 3 months ago

  • Status changed from Pull Request Review to Feedback
Actions #42

Updated by Jim Pingle 2 months ago

  • Subject changed from Gateway behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in the configuration
Actions

Also available in: Atom PDF