Bug #3815
closedGateway monitoring broken
0%
Description
Cheers,
Gateway monitoring seems utterly broken ATM. We get barrages of log messages along these lines:
Aug 19 15:12:51 php: rc.newipsecdns: Gateways status could not be determined, considering all as up/active. (Group: WANGW_FAILOVER) Aug 19 15:12:51 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:51 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:51 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:51 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:51 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:51 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:51 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:51 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:51 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:51 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:51 php: rc.newipsecdns: Forcefully reloading IPsec racoon daemon Aug 19 15:12:46 php: rc.newipsecdns: Gateways status could not be determined, considering all as up/active. (Group: WANGW_FAILOVER) Aug 19 15:12:46 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:46 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:46 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:46 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:46 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:46 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:46 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:46 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:46 php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:46 php: rc.newipsecdns: Default gateway down setting WANGW2 as default! Aug 19 15:12:46 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Aug 19 15:12:33 php: rc.filter_configure_sync: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:33 php: rc.filter_configure_sync: Default gateway down setting WANGW2 as default! Aug 19 15:12:31 php: rc.dyndns.update: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER Aug 19 15:12:31 php: rc.dyndns.update: Default gateway down setting WANGW2 as default! Aug 19 15:12:31 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW1.
Yes, WANGW1 is actually down as a testing measure. But that should not mean that pfsense/apinger should be allowed to start acting crazy and reassigning the default gateway every other millisecond.
Because we're trying to test a failover solution for implementation, we have the following options set:
- Reinitiate IPsec on gateway state change
- Allow changing of default gateway
The OpenVPN connection is pretty relaxed concerning this issue - probably because it's not directly running on the failover group, but rather using NAT via the LAN interface - but the IPsec connection (which goes the direct route) is utterly unusable, as it restarts every other minute at best.
The relevant rc. scripts complaining about this change by the minute, but it mostly seems to be something from the IPsec subsystem.
Especially the "considering all as active" bit is wildly irritating, at it just does seem to give up after five attempts of thinking WANGW1 is down and then setting it as up again.
Updated by Tobias Wolter about 10 years ago
That is, 2.1.4-release, to be exact.
Updated by Chris Buechler over 9 years ago
- Status changed from New to Feedback
It's definitely not as simple as gateway monitoring being broken, as it works fine in general. Might be some edge case here.
Tobias: have you been able to re-test on 2.2.2 release?
Updated by Tobias Wolter over 9 years ago
Customer's still rather keen on 2.1, I can possible set up a similar setup soon and try if it still behaves similarly with 2.2.
It's biting us again with a new setup we're currently doing, rendering multi-WAN pretty much useless. I could throw some redacted config files your way, with a bit of history what we already tried.
Updated by → luckman212 over 9 years ago
That might all be for naught - I saw over at #4081 that in 2.3 apinger is being forklifted out.
Updated by Chris Buechler over 9 years ago
Tobias: if you have a 2.2.2 (or newer) config that'll replicate, I'd definitely like to check it out. Email to cmb at pfsense dot org with a link here/reference this ticket #.
Updated by Chris Buechler over 8 years ago
- Category changed from Gateways to Gateway Monitoring
- Status changed from Feedback to Resolved
- Target version set to 2.3
- Affected Version set to All
resolved in 2.3 by replacing apinger