Bug #12920
closed
Gateway behavior differs when the gateway does not exist in the configuration
Added by Marcos M over 2 years ago.
Updated 7 months ago.
Category:
Gateway Monitoring
Plus Target Version:
24.03
Description
The gateway status and dpinger
behave differently when the respective gateway entry does not exist in the config.xml
file. This behavior difference results in failure to fail back after WAN failover.
Test:
- DHCP WAN
- Bounce interface physically and with
ifconfig
.
no gw
= no gateway entry in config.xml
gw
= gateway entry exists in config.xml
Netgate 5100¶
ifconfig
produced same results.
unplug cable plug cable
gateway status dpinger status gateway status dpinger status
22.01 no gw missing RUNNING ONLINE RUNNING
22.01 gw pending stopped pending stopped
22.05 no gw missing stopped ONLINE RUNNING
22.05 gw pending stopped ONLINE RUNNING
Netgate 1100¶
unplug cable plug cable
gateway status dpinger status gateway status dpinger status
22.01 no gw missing RUNNING ONLINE RUNNING
22.01 gw pending stopped pending stopped
22.05 no gw missing stopped ONLINE RUNNING
22.05 gw pending stopped ONLINE RUNNING
ifconfig down ifconfig up
gateway status dpinger status gateway status dpinger status
22.01 no gw offline RUNNING ONLINE RUNNING
22.01 gw offline RUNNING ONLINE RUNNING
22.05 no gw offline RUNNING ONLINE RUNNING
22.05 gw offline RUNNING ONLINE RUNNING
A missing gateway can have other undesired behavior:
- The
Automatic
default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.
dpinger
will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.
Files
- Description updated (diff)
- Subject changed from Gateway stays pending after link-loss recovery when using static routes to Gateway stays pending after link-loss recovery
- Description updated (diff)
- Description updated (diff)
- Assignee set to Viktor Gurov
- Target version set to 2.7.0
- Plus Target Version set to 22.05
- Affected Version set to 2.6.0
- Status changed from New to Pull Request Review
- Status changed from Pull Request Review to Feedback
- % Done changed from 0 to 100
- Status changed from Feedback to New
- Status changed from New to Pull Request Review
Tested fixes on current 22.05 snap on an 1100 and 5100.
The gateway status / dpinger behavior is now the same:
Gateway entry in config:
- interface down: dpinger process missing; gateway status missing
- interface up: dpinger process running; gateway status online
No gateway entry in config:
- interface down: dpinger process missing; gateway status missing
- interface up: dpinger process running; gateway status online
Edit: typo after copy/paste
- Status changed from Pull Request Review to Feedback
- Status changed from Feedback to New
With this in place it removes dynamic gateway entries for interfaces such as DHCP entirely when they are down, which is not what we want to happen. They should still be in the list, and have to be for certain things to function properly. I've reverted the change, we can try an alternate approach.
- Status changed from New to Feedback
- Status changed from Feedback to New
- Subject changed from Gateway stays pending after link-loss recovery to Gateway status behavior differs when the gateway does not exist in config.xml
- Description updated (diff)
Seeing what looks top be related whilst testing: https://redmine.pfsense.org/issues/12949
After the WAN interface is re-assigned dpinger is stopped and does not restart.
For example here the WAN is reassigned to igb0:
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Shutting down Router Advertisment daemon cleanly
Mar 22 14:48:43 check_reload_status 398 rc.newwanip starting igb0
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: calling interface_dhcpv6_configure.
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Accept router advertisements on interface igb0
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Starting DHCP6 client for interfaces igb0 in DHCP6 without RA mode
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Starting rtsold process on wan(igb0)
Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip: Info: starting on igb0.
Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip: on (IP address: 172.21.16.182) (interface: []) (real interface: igb0).
Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip called with empty interface.
Mar 22 14:48:44 check_reload_status 398 Reloading filter
Mar 22 14:48:44 php-fpm 368 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 172.21.16.182 - Restarting packages.
Mar 22 14:48:44 check_reload_status 398 Starting packages
Mar 22 14:48:45 php-fpm 369 /interfaces_assign.php: Default gateway setting Interface WAN_DHCP Gateway as default.
Mar 22 14:48:45 php-fpm 369 /interfaces_assign.php: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Mar 22 14:48:45 check_reload_status 398 Restarting IPsec tunnels
Mar 22 14:48:45 php-fpm 368 /rc.start_packages: Restarting/Starting all packages.
Mar 22 14:48:48 check_reload_status 398 updating dyndns wan
Mar 22 14:48:48 check_reload_status 398 Reloading filter
Mar 22 14:48:48 php-fpm 369 /interfaces_assign.php: Configuration Change: admin@172.21.16.243 (Local Database): Interfaces assignment settings changed
Mar 22 14:48:48 check_reload_status 398 Syncing firewall
Mar 22 14:48:48 php-fpm 369 /interfaces_assign.php: Creating rrd update script
Mar 22 14:48:48 kernel arprequest: cannot find matching address
The gateway log shows:
Mar 22 14:48:01 dpinger 14600 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.21.16.1 bind_addr 172.21.16.183 identifier "WAN_DHCP "
Mar 22 14:48:41 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:42 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65
Mar 22 14:48:43 dpinger 14600 exiting on signal 15
Tested:
2.7.0-DEVELOPMENT (amd64)
built on Tue Mar 22 06:20:34 UTC 2022
With the MR679 patch
- Plus Target Version changed from 22.05 to 22.09
- Status changed from New to Pull Request Review
- Description updated (diff)
Updating original post with results from 22.05 BETA.
Now the gateway returns to online in every case. However, there are still cases in which the gateway is missing which should not happen.
- Description updated (diff)
- Subject changed from Gateway status behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in config.xml
- Description updated (diff)
Updating OP with new symptoms.
- Description updated (diff)
- Plus Target Version changed from 22.09 to 22.11
- Plus Target Version changed from 22.11 to 23.01
- Assignee deleted (
Viktor Gurov)
- Status changed from Pull Request Review to Feedback
The last MR was merged a while ago. If there are still problems here we need a detailed list of incorrect behaviors, what they should be, and how to reproduce them.
- Status changed from Feedback to Resolved
Closing for lack of feedback either way here. I haven't noticed any gateway issues like this in a while and I've done quite a bit of testing with gateway events when working on other issues.
Marcos M wrote:
The gateway status and dpinger
behave differently when the respective gateway entry does not exist in the config.xml
file. This behavior difference results in failure to fail back after WAN failover.
Test:
- DHCP WAN
- Bounce interface physically and with
ifconfig
.
no gw
= no gateway entry in config.xml
gw
= gateway entry exists in config.xml
Netgate 5100¶
ifconfig
produced same results.
[...]
Netgate 1100¶
[...]
A missing gateway can have other undesired behavior:
- The
Automatic
default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.
dpinger
will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.
Verified this is the case on 5100 running the 23.01-BETA nightly from 12/17 with a default installation.
23.01-BETA (amd64) built on Sat Dec 17 14:33:51 UTC 2022
Cable connected/disconnected screenshots attached.
- Assignee set to Jim Pingle
- Plus Target Version changed from 23.01 to 23.05
Lets take our time with this and make sure it gets a thorough and proper analysis and correction for the next release. As it is, we're not worse off than we were on the last release at least, and if it affects someone there is a viable workaround: They can edit/save the gateway so it is populated in the config.
- Plus Target Version changed from 23.05 to 23.09
- Target version changed from 2.7.0 to CE-Next
- Plus Target Version changed from 23.09 to 24.01
- Plus Target Version changed from 24.01 to 24.03
- Related to Regression #11570: Gateway monitoring services is not always restarted on interface events, which may prevent a WAN from recovering back to an online state added
- Status changed from Confirmed to Pull Request Review
- Assignee changed from Jim Pingle to Marcos M
- Target version changed from CE-Next to 2.8.0
- Status changed from Pull Request Review to Feedback
- Subject changed from Gateway behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in the configuration
- Status changed from Feedback to Resolved
Also available in: Atom
PDF