Project

General

Profile

Actions

Bug #9450

closed

Multiwan gateway group fail-over not working as expected (possible race condition)

Added by nasir ahmed over 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
Multi-WAN
Target version:
Start date:
04/03/2019
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.4.x
Affected Architecture:
All

Description

Multiwan gateway group fail-over not working as expected. After a link state change is triggered by dpinger (rc.gateway_alarm is called) due to a higher priority link recovery, the rc.filter_configure_sync script fails to add the recovered gateway back to the gateway group because of a race condition.

The race condition:
rc.filter_configure_sync calls get_gwgroup_members_inner() function in gwlb.inc, which manages gateway inclusion/exclusion. But this function relies on get_dpinger_status() which reads live gateway status form the dpinger instance monitoring the triggering gateway, and By the time get_dpinger_status() reads the current values, they may have fluctuated back to the gateway down range.
This, prevents get_gwgroup_members_inner() from reactivation the gateway. In contrast, dpinger waits the "Alert interval" period which defaults to 1000 millisecond to check again for alarm conditions. By that time the loss and delay average values may move again out of the alarm range and dpinger may not trigger another gateway down alarm.

This bug results in pfsense and dpinger maintaining unmatched internal states for that particular gateway. In the described scenario the gateway status will be UP in dpinger and DOWN in pfsense. These unmatched states will be maintained until a new gateway event is triggered or a filter reload is called for any other reason.

As a solution I would suggest that the dpinger trigger values, should be written to a file by rc.gateway_alarm and kept for at least "Alert interval" long, and then deleted. Further more, get_gwgroup_members_inner() should read gateway loss and delay values from this file as long as it exists.

Actions #1

Updated by Jim Pingle over 5 years ago

  • Category set to Multi-WAN
  • Priority changed from High to Normal
  • Target version set to 2.5.0
  • Affected Version set to 2.4.x
  • Affected Architecture All added
  • Affected Architecture deleted ()
Actions #2

Updated by Chris Linstruth about 5 years ago

Adding these log snippets. They are groups of dpinger gateway logs followed by the system logs for the corresponding periods.
I marked the actual pfSense gateway group changes with an asterisk. Note how, in this case, the gateway remained down until the administrator logged in and manually did something to make the gateway group be reevaluated. Editing and saving a gateway I believe despite having recovered and stabilized hours ago.

Jun 27 14:49:07 firewall-name dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 203.0.113.1  bind_addr 203.0.113.200  identifier "OPT1_DHCP " 
Jul  1 01:59:15 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 1328830us stddev 3086974us loss 0%
Jul  1 02:01:11 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 43794us stddev 199505us loss 17%

Jul  1 01:59:15 firewall-name rc.gateway_alarm[70843]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:1328.830ms RTTsd:3086.974ms Loss:0%)
Jul  1 01:59:15 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 01:59:15 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 01:59:15 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 01:59:15 firewall-name check_reload_status: Reloading filter
*Jul  1 01:59:16 firewall-name php-fpm: /rc.openvpn: MONITOR: WANGW is down, omitting from routing group Primary 198.51.100.109|198.51.100.110|WANGW|1328.911ms|3086.939ms|0.0%|down*
Jul  1 01:59:16 firewall-name php-fpm: /rc.openvpn: Gateway, switch to: OPT1_DHCP
Jul  1 01:59:16 firewall-name php-fpm: /rc.openvpn: Default gateway setting Interface OPT1_DHCP Gateway as default.
Jul  1 01:59:16 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 01:59:16 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:01:11 firewall-name rc.gateway_alarm[69387]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:43.794ms RTTsd:199.505ms Loss:17%)
Jul  1 02:01:11 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:01:11 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:01:11 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:01:11 firewall-name check_reload_status: Reloading filter
Jul  1 02:01:12 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:01:12 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:01:14 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 46483us stddev 205767us loss 22%
Jul  1 02:02:36 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 3168568us stddev 1678463us loss 90%
Jul  1 02:03:30 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 1884347us stddev 2092328us loss 5%
Jul  1 02:04:25 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 2895us stddev 520us loss 0%

Jul  1 02:01:15 firewall-name rc.gateway_alarm[1952]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:46.483ms RTTsd:205.767ms Loss:22%)
Jul  1 02:01:15 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:01:15 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:01:15 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:01:15 firewall-name check_reload_status: Reloading filter
Jul  1 02:01:16 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:01:16 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:02:36 firewall-name rc.gateway_alarm[66874]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:3168.568ms RTTsd:1678.463ms Loss:90%)
Jul  1 02:02:36 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:02:36 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:02:36 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:02:36 firewall-name check_reload_status: Reloading filter
Jul  1 02:02:37 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:02:37 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:03:30 firewall-name rc.gateway_alarm[33286]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:1884.347ms RTTsd:2092.328ms Loss:5%)
Jul  1 02:03:30 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:03:30 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:03:30 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:03:30 firewall-name check_reload_status: Reloading filter
Jul  1 02:03:31 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:03:31 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:04:25 firewall-name rc.gateway_alarm[2150]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:2.895ms RTTsd:.520ms Loss:0%)
Jul  1 02:04:25 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:04:25 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:04:25 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:04:25 firewall-name check_reload_status: Reloading filter
*Jul  1 02:04:26 firewall-name php-fpm: /rc.openvpn: 352MONITOR: WANGW is available now, adding to routing group Primary 198.51.100.109|198.51.100.110|WANGW|2.93ms|0.589ms|0.0%|none*
Jul  1 02:04:26 firewall-name php-fpm: /rc.openvpn: Gateway, switch to: WANGW
Jul  1 02:04:26 firewall-name php-fpm: /rc.openvpn: Default gateway setting WAN Gateway as default.
Jul  1 02:04:26 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:04:26 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:05:19 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 547071us stddev 1647180us loss 0%
Jul  1 02:06:24 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 11531us stddev 55680us loss 21%
Jul  1 02:07:44 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 3116322us stddev 1833237us loss 90%
Jul  1 02:08:38 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 821647us stddev 1789602us loss 5%
Jul  1 02:08:54 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 160206us stddev 666241us loss 0%

Jul  1 02:05:19 firewall-name rc.gateway_alarm[68226]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:547.071ms RTTsd:1647.180ms Loss:0%)
Jul  1 02:05:19 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:05:19 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:05:19 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:05:19 firewall-name check_reload_status: Reloading filter
*Jul  1 02:05:20 firewall-name php-fpm: /rc.openvpn: MONITOR: WANGW is down, omitting from routing group Primary 198.51.100.109|198.51.100.110|WANGW|547.077ms|1647.178ms|0.0%|down*
Jul  1 02:05:20 firewall-name php-fpm: /rc.openvpn: Gateway, switch to: OPT1_DHCP
Jul  1 02:05:20 firewall-name php-fpm: /rc.openvpn: Default gateway setting Interface OPT1_DHCP Gateway as default.
Jul  1 02:05:20 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:05:20 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:06:24 firewall-name rc.gateway_alarm[31317]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:11.531ms RTTsd:55.680ms Loss:21%)
Jul  1 02:06:24 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:06:24 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:06:24 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:06:24 firewall-name check_reload_status: Reloading filter
Jul  1 02:06:25 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:06:25 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:07:44 firewall-name rc.gateway_alarm[97435]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:3116.322ms RTTsd:1833.237ms Loss:90%)
Jul  1 02:07:44 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:07:44 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:07:44 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:07:44 firewall-name check_reload_status: Reloading filter
Jul  1 02:07:45 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:07:45 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:08:38 firewall-name rc.gateway_alarm[63258]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:821.647ms RTTsd:1789.602ms Loss:5%)
Jul  1 02:08:38 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:08:38 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:08:38 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:08:38 firewall-name check_reload_status: Reloading filter
Jul  1 02:08:39 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:08:39 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:08:54 firewall-name rc.gateway_alarm[94197]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:160.206ms RTTsd:666.241ms Loss:0%)
Jul  1 02:08:54 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:08:54 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:08:54 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:08:54 firewall-name check_reload_status: Reloading filter
*Jul  1 02:08:55 firewall-name php-fpm: /rc.openvpn: 352MONITOR: WANGW is available now, adding to routing group Primary 198.51.100.109|198.51.100.110|WANGW|94.413ms|446.542ms|0.0%|none*
Jul  1 02:08:56 firewall-name php-fpm: /rc.openvpn: Gateway, switch to: WANGW
Jul  1 02:08:56 firewall-name php-fpm: /rc.openvpn: Default gateway setting WAN Gateway as default.
Jul  1 02:08:56 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:08:56 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:12:18 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 501264us stddev 1540741us loss 0%
Jul  1 02:12:37 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 499349us stddev 1534363us loss 0%

Jul  1 02:12:18 firewall-name rc.gateway_alarm[67803]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:501.264ms RTTsd:1540.741ms Loss:0%)
Jul  1 02:12:18 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:12:18 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:12:18 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:12:18 firewall-name check_reload_status: Reloading filter
*Jul  1 02:12:19 firewall-name php-fpm[351]: /rc.openvpn: MONITOR: WANGW has high latency, omitting from routing group Primary 198.51.100.109|198.51.100.110|WANGW|497.108ms|1534.978ms|0.0%|delay*
Jul  1 02:12:19 firewall-name php-fpm[351]: /rc.openvpn: Gateway, switch to: OPT1_DHCP
Jul  1 02:12:19 firewall-name php-fpm[351]: /rc.openvpn: Default gateway setting Interface OPT1_DHCP Gateway as default.
Jul  1 02:12:19 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:12:19 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:12:37 firewall-name rc.gateway_alarm[99477]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:499.349ms RTTsd:1534.363ms Loss:0%)
Jul  1 02:12:37 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:12:37 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:12:37 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:12:37 firewall-name check_reload_status: Reloading filter
Jul  1 02:12:38 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:12:38 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:12:42 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 502376us stddev 1533719us loss 0%
Jul  1 02:12:54 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 8749us stddev 39389us loss 10%

Jul  1 02:12:42 firewall-name rc.gateway_alarm[32265]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:502.376ms RTTsd:1533.719ms Loss:0%)
Jul  1 02:12:42 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:12:42 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:12:42 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:12:42 firewall-name check_reload_status: Reloading filter
Jul  1 02:12:44 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:12:44 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:12:54 firewall-name rc.gateway_alarm[64010]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:8.749ms RTTsd:39.389ms Loss:10%)
Jul  1 02:12:54 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:12:54 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:12:54 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:12:54 firewall-name check_reload_status: Reloading filter
Jul  1 02:12:55 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:12:55 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:13:00 firewall-name rc.gateway_alarm[94382]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:9.583ms RTTsd:41.807ms Loss:21%)

Jul  1 02:13:00 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 9583us stddev 41807us loss 21%
Jul  1 02:14:36 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 2507259us stddev 1635716us loss 90%
Jul  1 02:15:30 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 441793us stddev 966826us loss 5%
Jul  1 02:15:34 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 380640us stddev 848533us loss 0%

Jul  1 02:13:00 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:13:00 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:13:00 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:13:00 firewall-name check_reload_status: Reloading filter
Jul  1 02:13:01 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:13:01 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:14:36 firewall-name rc.gateway_alarm[93227]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:2507.259ms RTTsd:1635.716ms Loss:90%)
Jul  1 02:14:36 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:14:36 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:14:36 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:14:36 firewall-name check_reload_status: Reloading filter
Jul  1 02:14:37 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:14:37 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:15:30 firewall-name rc.gateway_alarm[61727]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:441.793ms RTTsd:966.826ms Loss:5%)
Jul  1 02:15:30 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:15:30 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:15:30 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:15:30 firewall-name check_reload_status: Reloading filter
Jul  1 02:15:31 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:15:31 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:15:34 firewall-name rc.gateway_alarm[92708]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:380.640ms RTTsd:848.533ms Loss:0%)
Jul  1 02:15:34 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:15:34 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:15:34 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:15:34 firewall-name check_reload_status: Reloading filter
Jul  1 02:15:35 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:15:35 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:16:56 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 729582us stddev 1662029us loss 0%
Jul  1 02:18:05 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 448842us stddev 1388060us loss 6%

Jul  1 02:16:56 firewall-name rc.gateway_alarm[55853]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:729.582ms RTTsd:1662.029ms Loss:0%)
Jul  1 02:16:56 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:16:56 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:16:56 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:16:56 firewall-name check_reload_status: Reloading filter
Jul  1 02:16:57 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:16:57 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:18:05 firewall-name rc.gateway_alarm[23620]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:448.842ms RTTsd:1388.060ms Loss:6%)
Jul  1 02:18:05 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:18:05 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:18:05 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:18:05 firewall-name check_reload_status: Reloading filter

Jul  1 02:18:06 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 515366us stddev 1517537us loss 7%
Jul  1 02:18:22 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 391842us stddev 1286779us loss 0%

Jul  1 02:18:06 firewall-name rc.gateway_alarm[26820]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:515.366ms RTTsd:1517.537ms Loss:7%)
Jul  1 02:18:06 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:18:06 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:18:06 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:18:06 firewall-name check_reload_status: Reloading filter
Jul  1 02:18:06 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:18:06 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:18:08 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:18:08 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:18:22 firewall-name rc.gateway_alarm[83385]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:391.842ms RTTsd:1286.779ms Loss:0%)
Jul  1 02:18:22 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:18:22 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:18:22 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:18:22 firewall-name check_reload_status: Reloading filter
Jul  1 02:18:23 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:18:23 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:19:02 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 833425us stddev 1822007us loss 0%
Jul  1 02:19:17 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 445433us stddev 1416569us loss 0%

Jul  1 02:19:02 firewall-name rc.gateway_alarm[47953]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:833.425ms RTTsd:1822.007ms Loss:0%)
Jul  1 02:19:02 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:19:02 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:19:02 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:19:02 firewall-name check_reload_status: Reloading filter
Jul  1 02:19:03 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:19:03 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:19:17 firewall-name rc.gateway_alarm[79365]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:445.433ms RTTsd:1416.569ms Loss:0%)
Jul  1 02:19:17 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:19:17 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:19:17 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:19:17 firewall-name check_reload_status: Reloading filter
Jul  1 02:19:18 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:19:18 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 02:21:29 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 13472us stddev 62844us loss 22%
Jul  1 02:22:32 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 2175721us stddev 1371078us loss 92%
Jul  1 02:23:27 firewall-name dpinger: WANGW 198.51.100.109: Alarm latency 528310us stddev 908015us loss 6%
Jul  1 02:23:41 firewall-name dpinger: WANGW 198.51.100.109: Clear latency 288361us stddev 628682us loss 0%

Jul  1 02:21:29 firewall-name rc.gateway_alarm[76265]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:13.472ms RTTsd:62.844ms Loss:22%)
Jul  1 02:21:29 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:21:29 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:21:29 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:21:29 firewall-name check_reload_status: Reloading filter
Jul  1 02:21:30 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:21:30 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:22:32 firewall-name rc.gateway_alarm[44123]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:2175.721ms RTTsd:1371.078ms Loss:92%)
Jul  1 02:22:32 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:22:32 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:22:32 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:22:32 firewall-name check_reload_status: Reloading filter
Jul  1 02:22:33 firewall-name php-fpm[351]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:22:33 firewall-name php-fpm[351]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:23:27 firewall-name rc.gateway_alarm[13090]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:1 RTT:528.310ms RTTsd:908.015ms Loss:6%)
Jul  1 02:23:27 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:23:27 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:23:27 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:23:27 firewall-name check_reload_status: Reloading filter
Jul  1 02:23:29 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:23:29 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
Jul  1 02:23:41 firewall-name rc.gateway_alarm[63462]: >>> Gateway alarm: WANGW (Addr:198.51.100.109 Alarm:0 RTT:288.361ms RTTsd:628.682ms Loss:0%)
Jul  1 02:23:41 firewall-name check_reload_status: updating dyndns WANGW
Jul  1 02:23:41 firewall-name check_reload_status: Restarting ipsec tunnels
Jul  1 02:23:41 firewall-name check_reload_status: Restarting OpenVPN tunnels/interfaces
Jul  1 02:23:41 firewall-name check_reload_status: Reloading filter
Jul  1 02:23:42 firewall-name php-fpm: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Jul  1 02:23:42 firewall-name php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.

Jul  1 12:47:39 firewall-name dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 198.51.100.109  bind_addr 198.51.100.110  identifier "WANGW " 
Jul  1 12:47:39 firewall-name dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 203.0.113.1  bind_addr 203.0.113.200  identifier "OPT1_DHCP " 

Jul  1 12:46:59 firewall-name php-fpm: /index.php: Successful login for user 'admin' from: 10.1.1.2 (Local Database)
*Jul  1 12:47:05 firewall-name php-fpm[351]: /system_gateways.php: 351MONITOR: WANGW is available now, adding to routing group Primary 198.51.100.109|198.51.100.110|WANGW|19.264ms|86.126ms|0.0%|none*
Jul  1 12:47:30 firewall-name check_reload_status: Syncing firewall
Actions #3

Updated by Jim Pingle over 4 years ago

  • Status changed from New to Pull Request Review
Actions #4

Updated by Jörn Greszki about 4 years ago

Dear gents

is the behavior I describe

https://forum.netgate.com/topic/156890/dpinger-broken-or-dashboard-broken-or-my-brain-is-broken

related to your findings?

Actions #5

Updated by Renato Botelho almost 4 years ago

  • Status changed from Pull Request Review to Feedback
  • Assignee set to Renato Botelho

PR has been merged. Thanks!

Actions #6

Updated by Vladimir Voskoboynikov almost 4 years ago

  • % Done changed from 0 to 100
Actions #7

Updated by Renato Botelho almost 4 years ago

  • Status changed from Feedback to In Progress

Some problems reported at https://forum.netgate.com/topic/157633/wan-gw-monitor/2 after this was merged. I'll work on a fix

Actions #8

Updated by Renato Botelho almost 4 years ago

  • Status changed from In Progress to Feedback
Actions #9

Updated by Renato Botelho almost 4 years ago

  • Status changed from Feedback to Resolved

As mentioned on forums, it works

Actions #10

Updated by Anonymous almost 4 years ago

It now appears that gateways that are forced down in the gateway configs do not appear to be offlined.

Previously, such a gateway would show status as "Forced" or "Administrative" Down in the gateway status.

If gateway is configured with both "Disable Monitoring" & "Force State: Mark Gateway as Down" checkboxes selected, it will show as "Online (unmonitored)" on Gateway status page. If only "Force State" is checked, will show as "Pending" in gateway status. In both cases the gateway groups status will show this gateway as "Gathering data".

Actions #11

Updated by Renato Botelho almost 4 years ago

  • Status changed from Resolved to In Progress
Actions #12

Updated by Renato Botelho almost 4 years ago

  • Status changed from In Progress to Feedback

I've reverted the cache changes and it should be fine now

Actions #13

Updated by Jörn Greszki almost 4 years ago

I get for unknown reasons sometimes partial or full loss, but this is not the issue.

Nov 2 10:37:56 dpinger 16236 WAN_PHY1_IGB0GW 8.8.4.4: Alarm latency 0us stddev 0us loss 100%

Problem is that this status remains until any change to the gateway group is made - then it works immediately.

Tested with 2.5.0.a.20201101.1850

Full story: https://forum.netgate.com/topic/156890/dpinger-broken-or-dashboard-broken-or-my-brain-is-broken

Actions #14

Updated by Dee D almost 4 years ago

I’m not getting desired behavior but I don’t know if it’s caused by changes to this

output of /tmp/rules.debug after reset

GWCABLE_DHCP = " route-to ( hn0 24.52.xxx.97 ) "
GWDSL_PPPOE = " route-to ( pppoe0 206.248.xxx.132 ) "
GWCable_failover = " route-to { ( hn0 24.52.xxx.97 ) ( hn0 24.52.xxx.97 ) } "
GWDSL_Failover = " route-to { ( pppoe0 206.248.xxx.132 ) ( pppoe0 206.248.xxx.132 ) } "
GWLoad_Balance = " "

Cable_Failover is CABLE_DHCP tier 1, DSL_PPOE tier 2
DSL_Failover is DSL_PPPOE tier 1, CABLE_DHCP tier 2
Load_balance is both tier 1

Is the output above correct? It didn’t used to look like that (but then it didn’t work then either)

DSL gets taken off for packet loss:

8:40:30 MONITOR: DSL_PPPOE has packet loss, omitting from routing group Cable_failover
8.8.4.4|206.248.xxx.113|DSL_PPPOE|16.47ms|0.303ms|22%|down|highloss
8:48:12 MONITOR: DSL_PPPOE is available now, adding to routing group Cable_failover
8.8.4.4|206.248.xxx.113|DSL_PPPOE|16.301ms|0.061ms|0.0%|online|none

output from /tmp/rules/debug following the event:

GWCABLE_DHCP = " route-to ( hn0 24.52.xxx.97) "
GWDSL_PPPOE = " "
GWCable_failover = " route-to { ( hn0 24.52.xxx.97 ) ( hn0 24.52.xxx.97 ) } "
GWDSL_Failover = " route-to { ( hn0 24.52.xxx.97 ) ( hn0 24.52.xxx.97 ) } "
GWLoad_Balance = " "

DSL doesn’t work anymore until manual intervention

Actions #15

Updated by Max Leighton over 3 years ago

  • Status changed from Feedback to Resolved

I can not reproduce this in 2.5 under the same conditions that cause it in 2.4.5p1. Dee D's response sounds like the behavior we see in 2.4.5p1 when this is triggered, but it's not happening in the latest build. I will mark this resolved because I can consistently reproduce this in 2.4.5p1 but no longer in 2.5

Actions #16

Updated by Dee D over 3 years ago

Yet every time my dsl goes offline e.g. for packet loss and comes back according to the notification, when I cat /tmp/rules.debug the wrong (secondary) gateway is still in the failover group. There were a lot of potential explanations for what was happening during 2.5 dev but none of the resolved bugs fixed it?

Actions

Also available in: Atom PDF