Project

General

Profile

Actions

Bug #12920

open

Gateway behavior differs when the gateway does not exist in config.xml

Added by Marcos Mendoza 4 months ago. Updated about 14 hours ago.

Status:
Pull Request Review
Priority:
Normal
Assignee:
Viktor Gurov
Category:
Gateway Monitoring
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
22.11
Release Notes:
Default
Affected Version:
2.6.0
Affected Architecture:

Description

The gateway status and dpinger behave differently when the respective gateway entry does not exist in the config.xml file. This behavior difference results in failure to fail back after WAN failover.

Test:
  • DHCP WAN
  • Bounce interface physically and with ifconfig.
  • no gw = no gateway entry in config.xml
  • gw = gateway entry exists in config.xml

Netgate 5100

ifconfig produced same results.

            unplug cable                            plug cable
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw missing             RUNNING             ONLINE              RUNNING
22.01 gw    pending             stopped             pending             stopped

22.05 no gw missing             stopped             ONLINE              RUNNING
22.05 gw    pending             stopped             ONLINE              RUNNING

Netgate 1100

            unplug cable                            plug cable
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw missing             RUNNING             ONLINE              RUNNING
22.01 gw    pending             stopped             pending             stopped

22.05 no gw missing             stopped             ONLINE              RUNNING
22.05 gw    pending             stopped             ONLINE              RUNNING

            ifconfig down                           ifconfig up
            gateway status      dpinger status      gateway status      dpinger status
22.01 no gw offline             RUNNING             ONLINE              RUNNING
22.01 gw    offline             RUNNING             ONLINE              RUNNING

22.05 no gw offline             RUNNING             ONLINE              RUNNING
22.05 gw    offline             RUNNING             ONLINE              RUNNING
A missing gateway can have other undesired behavior:
  • The Automatic default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.
  • dpinger will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.
Actions #1

Updated by Marcos Mendoza 4 months ago

Some notes:

It shouldn't be an issue for WAN failover on 22.05 given that dpinger starts back up. However, it's unclear if it should stop at all. This may be related to the issues reported here:
https://forum.netgate.com/topic/169949/dpinger-stops-crashes-after-update-to-2-6-0/

Actions #2

Updated by Marcos Mendoza 4 months ago

  • Description updated (diff)
Actions #3

Updated by Marcos Mendoza 4 months ago

  • Subject changed from Gateway stays pending after link-loss recovery when using static routes to Gateway stays pending after link-loss recovery
  • Description updated (diff)
Actions #4

Updated by Marcos Mendoza 4 months ago

  • Description updated (diff)
Actions #5

Updated by Viktor Gurov 4 months ago

  • Assignee set to Viktor Gurov
  • Target version set to 2.7.0
  • Plus Target Version set to 22.05
  • Affected Version set to 2.6.0
Actions #6

Updated by Jim Pingle 4 months ago

  • Status changed from New to Pull Request Review
Actions #7

Updated by Viktor Gurov 4 months ago

  • Status changed from Pull Request Review to Feedback
  • % Done changed from 0 to 100
Actions #8

Updated by Viktor Gurov 4 months ago

  • Status changed from Feedback to New
Actions #9

Updated by Jim Pingle 4 months ago

  • Status changed from New to Pull Request Review
Actions #10

Updated by Marcos Mendoza 4 months ago

Tested fixes on current 22.05 snap on an 1100 and 5100.

The gateway status / dpinger behavior is now the same:
Gateway entry in config:
  • interface down: dpinger process missing; gateway status missing
  • interface up: dpinger process running; gateway status online
No gateway entry in config:
  • interface down: dpinger process missing; gateway status missing
  • interface up: dpinger process running; gateway status online

Edit: typo after copy/paste

Actions #11

Updated by Viktor Gurov 4 months ago

  • Status changed from Pull Request Review to Feedback
Actions #12

Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to New

With this in place it removes dynamic gateway entries for interfaces such as DHCP entirely when they are down, which is not what we want to happen. They should still be in the list, and have to be for certain things to function properly. I've reverted the change, we can try an alternate approach.

Actions #13

Updated by Jim Pingle 4 months ago

  • Status changed from New to Feedback
Actions #14

Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to New
Actions #15

Updated by Marcos Mendoza 4 months ago

  • Subject changed from Gateway stays pending after link-loss recovery to Gateway status behavior differs when the gateway does not exist in config.xml
Actions #16

Updated by Marcos Mendoza 4 months ago

  • Description updated (diff)
Actions #19

Updated by Steve Wheeler 3 months ago

Seeing what looks top be related whilst testing: https://redmine.pfsense.org/issues/12949

After the WAN interface is re-assigned dpinger is stopped and does not restart.
For example here the WAN is reassigned to igb0:

Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Shutting down Router Advertisment daemon cleanly
Mar 22 14:48:43     check_reload_status     398     rc.newwanip starting igb0
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: calling interface_dhcpv6_configure.
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Accept router advertisements on interface igb0
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Starting DHCP6 client for interfaces igb0 in DHCP6 without RA mode
Mar 22 14:48:43     php-fpm     369     /interfaces_assign.php: Starting rtsold process on wan(igb0)
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip: Info: starting on igb0.
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip: on (IP address: 172.21.16.182) (interface: []) (real interface: igb0).
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: rc.newwanip called with empty interface.
Mar 22 14:48:44     check_reload_status     398     Reloading filter
Mar 22 14:48:44     php-fpm     368     /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 172.21.16.182 - Restarting packages.
Mar 22 14:48:44     check_reload_status     398     Starting packages
Mar 22 14:48:45     php-fpm     369     /interfaces_assign.php: Default gateway setting Interface WAN_DHCP Gateway as default.
Mar 22 14:48:45     php-fpm     369     /interfaces_assign.php: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Mar 22 14:48:45     check_reload_status     398     Restarting IPsec tunnels
Mar 22 14:48:45     php-fpm     368     /rc.start_packages: Restarting/Starting all packages.
Mar 22 14:48:48     check_reload_status     398     updating dyndns wan
Mar 22 14:48:48     check_reload_status     398     Reloading filter
Mar 22 14:48:48     php-fpm     369     /interfaces_assign.php: Configuration Change: admin@172.21.16.243 (Local Database): Interfaces assignment settings changed
Mar 22 14:48:48     check_reload_status     398     Syncing firewall
Mar 22 14:48:48     php-fpm     369     /interfaces_assign.php: Creating rrd update script
Mar 22 14:48:48     kernel         arprequest: cannot find matching address 

The gateway log shows:

Mar 22 14:48:01     dpinger     14600     send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.21.16.1 bind_addr 172.21.16.183 identifier "WAN_DHCP " 
Mar 22 14:48:41     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43     dpinger     14600     WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43     dpinger     14600     exiting on signal 15 

Tested:
2.7.0-DEVELOPMENT (amd64)
built on Tue Mar 22 06:20:34 UTC 2022
With the MR679 patch

Actions #20

Updated by Jim Pingle about 1 month ago

  • Plus Target Version changed from 22.05 to 22.09
Actions #22

Updated by Jim Pingle about 1 month ago

  • Status changed from New to Pull Request Review
Actions #23

Updated by Marcos Mendoza about 1 month ago

  • Description updated (diff)

Updating original post with results from 22.05 BETA.

Now the gateway returns to online in every case. However, there are still cases in which the gateway is missing which should not happen.

Actions #24

Updated by Marcos Mendoza about 1 month ago

  • Description updated (diff)
Actions #25

Updated by Marcos Mendoza 17 days ago

  • Subject changed from Gateway status behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in config.xml
  • Description updated (diff)

Updating OP with new symptoms.

Actions #26

Updated by Marcos Mendoza 17 days ago

  • Description updated (diff)
Actions #27

Updated by Jim Pingle about 14 hours ago

  • Plus Target Version changed from 22.09 to 22.11
Actions

Also available in: Atom PDF