Project

General

Profile

Bug #3815

Gateway monitoring broken

Added by Tobias Wolter almost 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Gateway monitoring
Target version:
Start date:
08/19/2014
Due date:
% Done:

0%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

Cheers,

Gateway monitoring seems utterly broken ATM. We get barrages of log messages along these lines:

Aug 19 15:12:51     php: rc.newipsecdns: Gateways status could not be determined, considering all as up/active. (Group: WANGW_FAILOVER)
Aug 19 15:12:51     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:51     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:51     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:51     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:51     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:51     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:51     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:51     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:51     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:51     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:51     php: rc.newipsecdns: Forcefully reloading IPsec racoon daemon
Aug 19 15:12:46     php: rc.newipsecdns: Gateways status could not be determined, considering all as up/active. (Group: WANGW_FAILOVER)
Aug 19 15:12:46     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:46     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:46     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:46     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:46     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:46     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:46     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:46     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:46     php: rc.newipsecdns: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:46     php: rc.newipsecdns: Default gateway down setting WANGW2 as default!
Aug 19 15:12:46     php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
Aug 19 15:12:33     php: rc.filter_configure_sync: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:33     php: rc.filter_configure_sync: Default gateway down setting WANGW2 as default!
Aug 19 15:12:31     php: rc.dyndns.update: MONITOR: WANGW1 is down, removing from routing group WANGW_FAILOVER
Aug 19 15:12:31     php: rc.dyndns.update: Default gateway down setting WANGW2 as default!
Aug 19 15:12:31     php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW1.

Yes, WANGW1 is actually down as a testing measure. But that should not mean that pfsense/apinger should be allowed to start acting crazy and reassigning the default gateway every other millisecond.

Because we're trying to test a failover solution for implementation, we have the following options set:

  • Reinitiate IPsec on gateway state change
  • Allow changing of default gateway

The OpenVPN connection is pretty relaxed concerning this issue - probably because it's not directly running on the failover group, but rather using NAT via the LAN interface - but the IPsec connection (which goes the direct route) is utterly unusable, as it restarts every other minute at best.

The relevant rc. scripts complaining about this change by the minute, but it mostly seems to be something from the IPsec subsystem.

Especially the "considering all as active" bit is wildly irritating, at it just does seem to give up after five attempts of thinking WANGW1 is down and then setting it as up again.

History

#1 Updated by Tobias Wolter almost 5 years ago

+ Affected version is head.

#2 Updated by Tobias Wolter almost 5 years ago

That is, 2.1.4-release, to be exact.

#3 Updated by Chris Buechler about 4 years ago

  • Status changed from New to Feedback

It's definitely not as simple as gateway monitoring being broken, as it works fine in general. Might be some edge case here.

Tobias: have you been able to re-test on 2.2.2 release?

#4 Updated by Tobias Wolter about 4 years ago

Customer's still rather keen on 2.1, I can possible set up a similar setup soon and try if it still behaves similarly with 2.2.

It's biting us again with a new setup we're currently doing, rendering multi-WAN pretty much useless. I could throw some redacted config files your way, with a bit of history what we already tried.

#5 Updated by Luke Hamburg about 4 years ago

That might all be for naught - I saw over at #4081 that in 2.3 apinger is being forklifted out.

#6 Updated by Chris Buechler about 4 years ago

Tobias: if you have a 2.2.2 (or newer) config that'll replicate, I'd definitely like to check it out. Email to cmb at pfsense dot org with a link here/reference this ticket #.

#7 Updated by Chris Buechler over 3 years ago

  • Category changed from Gateways to Gateway monitoring
  • Status changed from Feedback to Resolved
  • Target version set to 2.3
  • Affected Version set to All

resolved in 2.3 by replacing apinger

Also available in: Atom PDF