Bug #12920
closedGateway behavior differs when the gateway does not exist in the configuration
100%
Description
The gateway status and dpinger
behave differently when the respective gateway entry does not exist in the config.xml
file. This behavior difference results in failure to fail back after WAN failover.
- DHCP WAN
- Bounce interface physically and with
ifconfig
. no gw
= no gateway entry in config.xmlgw
= gateway entry exists in config.xml
Netgate 5100¶
ifconfig
produced same results.
unplug cable plug cable gateway status dpinger status gateway status dpinger status 22.01 no gw missing RUNNING ONLINE RUNNING 22.01 gw pending stopped pending stopped 22.05 no gw missing stopped ONLINE RUNNING 22.05 gw pending stopped ONLINE RUNNING
Netgate 1100¶
unplug cable plug cable gateway status dpinger status gateway status dpinger status 22.01 no gw missing RUNNING ONLINE RUNNING 22.01 gw pending stopped pending stopped 22.05 no gw missing stopped ONLINE RUNNING 22.05 gw pending stopped ONLINE RUNNING ifconfig down ifconfig up gateway status dpinger status gateway status dpinger status 22.01 no gw offline RUNNING ONLINE RUNNING 22.01 gw offline RUNNING ONLINE RUNNING 22.05 no gw offline RUNNING ONLINE RUNNING 22.05 gw offline RUNNING ONLINE RUNNINGA missing gateway can have other undesired behavior:
- The
Automatic
default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry. dpinger
will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.
Files
Related issues
Updated by Marcos M over 2 years ago
Some notes:
It shouldn't be an issue for WAN failover on 22.05 given that dpinger
starts back up. However, it's unclear if it should stop at all. This may be related to the issues reported here:
https://forum.netgate.com/topic/169949/dpinger-stops-crashes-after-update-to-2-6-0/
Updated by Marcos M over 2 years ago
- Subject changed from Gateway stays pending after link-loss recovery when using static routes to Gateway stays pending after link-loss recovery
- Description updated (diff)
Updated by Viktor Gurov over 2 years ago
- Assignee set to Viktor Gurov
- Target version set to 2.7.0
- Plus Target Version set to 22.05
- Affected Version set to 2.6.0
Updated by Jim Pingle over 2 years ago
- Status changed from New to Pull Request Review
Updated by Viktor Gurov over 2 years ago
- Status changed from Pull Request Review to Feedback
- % Done changed from 0 to 100
Applied in changeset e7954a79ce0d386706dcde2e039ef57875ecee0a.
Updated by Viktor Gurov over 2 years ago
- Status changed from Feedback to New
Updated by Jim Pingle over 2 years ago
- Status changed from New to Pull Request Review
Updated by Marcos M over 2 years ago
Tested fixes on current 22.05 snap on an 1100 and 5100.
The gateway status / dpinger behavior is now the same:Gateway entry in config:
- interface down: dpinger process missing; gateway status missing
- interface up: dpinger process running; gateway status online
- interface down: dpinger process missing; gateway status missing
- interface up: dpinger process running; gateway status online
Edit: typo after copy/paste
Updated by Viktor Gurov over 2 years ago
- Status changed from Pull Request Review to Feedback
Applied in changeset c07c5cf5f2387cb2b9efdf25545bafebfa414f00.
Updated by Jim Pingle over 2 years ago
- Status changed from Feedback to New
With this in place it removes dynamic gateway entries for interfaces such as DHCP entirely when they are down, which is not what we want to happen. They should still be in the list, and have to be for certain things to function properly. I've reverted the change, we can try an alternate approach.
Updated by Jim Pingle over 2 years ago
- Status changed from New to Feedback
Applied in changeset d250c083dffa1e1d429f871f2081644dfa9d2f62.
Updated by Marcos M over 2 years ago
- Subject changed from Gateway stays pending after link-loss recovery to Gateway status behavior differs when the gateway does not exist in config.xml
Updated by Steve Wheeler over 2 years ago
Seeing what looks top be related whilst testing: https://redmine.pfsense.org/issues/12949
After the WAN interface is re-assigned dpinger is stopped and does not restart.
For example here the WAN is reassigned to igb0:
Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Shutting down Router Advertisment daemon cleanly Mar 22 14:48:43 check_reload_status 398 rc.newwanip starting igb0 Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: calling interface_dhcpv6_configure. Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Accept router advertisements on interface igb0 Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Starting DHCP6 client for interfaces igb0 in DHCP6 without RA mode Mar 22 14:48:43 php-fpm 369 /interfaces_assign.php: Starting rtsold process on wan(igb0) Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip: Info: starting on igb0. Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip: on (IP address: 172.21.16.182) (interface: []) (real interface: igb0). Mar 22 14:48:44 php-fpm 368 /rc.newwanip: rc.newwanip called with empty interface. Mar 22 14:48:44 check_reload_status 398 Reloading filter Mar 22 14:48:44 php-fpm 368 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 172.21.16.182 - Restarting packages. Mar 22 14:48:44 check_reload_status 398 Starting packages Mar 22 14:48:45 php-fpm 369 /interfaces_assign.php: Default gateway setting Interface WAN_DHCP Gateway as default. Mar 22 14:48:45 php-fpm 369 /interfaces_assign.php: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6' Mar 22 14:48:45 check_reload_status 398 Restarting IPsec tunnels Mar 22 14:48:45 php-fpm 368 /rc.start_packages: Restarting/Starting all packages. Mar 22 14:48:48 check_reload_status 398 updating dyndns wan Mar 22 14:48:48 check_reload_status 398 Reloading filter Mar 22 14:48:48 php-fpm 369 /interfaces_assign.php: Configuration Change: admin@172.21.16.243 (Local Database): Interfaces assignment settings changed Mar 22 14:48:48 check_reload_status 398 Syncing firewall Mar 22 14:48:48 php-fpm 369 /interfaces_assign.php: Creating rrd update script Mar 22 14:48:48 kernel arprequest: cannot find matching address
The gateway log shows:
Mar 22 14:48:01 dpinger 14600 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.21.16.1 bind_addr 172.21.16.183 identifier "WAN_DHCP " Mar 22 14:48:41 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65 Mar 22 14:48:42 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65 Mar 22 14:48:42 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65 Mar 22 14:48:43 dpinger 14600 WAN_DHCP 172.21.16.1: sendto error: 65 Mar 22 14:48:43 dpinger 14600 exiting on signal 15
Tested:
2.7.0-DEVELOPMENT (amd64)
built on Tue Mar 22 06:20:34 UTC 2022
With the MR679 patch
Updated by Jim Pingle over 2 years ago
- Plus Target Version changed from 22.05 to 22.09
Updated by Viktor Gurov over 2 years ago
Updated by Jim Pingle over 2 years ago
- Status changed from New to Pull Request Review
Updated by Marcos M over 2 years ago
- Description updated (diff)
Updating original post with results from 22.05 BETA.
Now the gateway returns to online in every case. However, there are still cases in which the gateway is missing which should not happen.
Updated by Marcos M over 2 years ago
- Subject changed from Gateway status behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in config.xml
- Description updated (diff)
Updating OP with new symptoms.
Updated by Jim Pingle over 2 years ago
- Plus Target Version changed from 22.09 to 22.11
Updated by Jim Pingle about 2 years ago
- Plus Target Version changed from 22.11 to 23.01
Updated by Jim Pingle about 2 years ago
- Status changed from Pull Request Review to Feedback
The last MR was merged a while ago. If there are still problems here we need a detailed list of incorrect behaviors, what they should be, and how to reproduce them.
Updated by Jim Pingle almost 2 years ago
- Status changed from Feedback to Resolved
Closing for lack of feedback either way here. I haven't noticed any gateway issues like this in a while and I've done quite a bit of testing with gateway events when working on other issues.
Updated by Ryan Coleman almost 2 years ago
- File rm12920-5100-igb0-connected.png rm12920-5100-igb0-connected.png added
- File rm12920-5100-igb0-disconnected.png rm12920-5100-igb0-disconnected.png added
- Status changed from Resolved to Confirmed
Marcos M wrote:
The gateway status and
Test:dpinger
behave differently when the respective gateway entry does not exist in theconfig.xml
file. This behavior difference results in failure to fail back after WAN failover.
- DHCP WAN
- Bounce interface physically and with
ifconfig
.no gw
= no gateway entry in config.xmlgw
= gateway entry exists in config.xmlNetgate 5100¶
ifconfig
produced same results.
[...]Netgate 1100¶
[...]
A missing gateway can have other undesired behavior:
- The
Automatic
default gateway detection will choose disabled gateways over an enabled and online gateway which has the missing config.xml entry.dpinger
will not start and the gateway status will remain pending after releasing/renewing the WAN DHCP lease.
Verified this is the case on 5100 running the 23.01-BETA nightly from 12/17 with a default installation.
23.01-BETA (amd64) built on Sat Dec 17 14:33:51 UTC 2022
Cable connected/disconnected screenshots attached.
Updated by Jim Pingle almost 2 years ago
- Assignee set to Jim Pingle
- Plus Target Version changed from 23.01 to 23.05
Lets take our time with this and make sure it gets a thorough and proper analysis and correction for the next release. As it is, we're not worse off than we were on the last release at least, and if it affects someone there is a viable workaround: They can edit/save the gateway so it is populated in the config.
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 23.05 to 23.09
Updated by Jim Pingle over 1 year ago
- Target version changed from 2.7.0 to CE-Next
Updated by Marcos M about 1 year ago
- Related to Regression #14616: dpinger does not start after renewing DHCP added
Updated by Jim Pingle about 1 year ago
- Plus Target Version changed from 23.09 to 24.01
Updated by Jim Pingle about 1 year ago
- Plus Target Version changed from 24.01 to 24.03
Updated by Marcos M about 1 year ago
- Related to Regression #11570: Gateway monitoring services is not always restarted on interface events, which may prevent a WAN from recovering back to an online state added
Updated by Marcos M 10 months ago
- Status changed from Confirmed to Pull Request Review
- Assignee changed from Jim Pingle to Marcos M
- Target version changed from CE-Next to 2.8.0
https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1124
This change makes sure gateways are added to the config.
Updated by Marcos M 10 months ago
- Status changed from Pull Request Review to Feedback
Applied in changeset 17e64d8dc879e2282a95291621f4192f841f6cc5.
Updated by Jim Pingle 10 months ago
- Subject changed from Gateway behavior differs when the gateway does not exist in config.xml to Gateway behavior differs when the gateway does not exist in the configuration