Project

General

Profile

Bug #8914

Gateway switch events cause a huge amount of log spew

Added by Luke Hamburg 4 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Category:
Logging
Target version:
Start date:
09/18/2018
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4
Affected Architecture:

Description

When a gateway failure/defgw switch event occurs, there seem to be some code paths that get iterated over dozens of times in a very short span of time. This floods system.log making it hard to read. Specifically, these 2 phrases are repeated over and over:

Gateways status could not be determined, considering all as up/active
Keep current gateway, its already part of the group members

This is part of an effort to try to make the logs readable again, while still allowing verbose logging if needed via a new hidden config option from the developer shell:

$config['system']['gw-debug'] = true;
write_config("Enable Developer Spew");
exec
exit

History

#1 Updated by Renato Botelho 4 months ago

  • Status changed from New to Feedback
  • Assignee set to Renato Botelho
  • Target version set to 2.4.4-GS

PR merged

#2 Updated by Paighton Bisconer 4 months ago

Replicated in 2.4.4-RELEASE

Steps to replicate:
Configure Gateway Group with 2 WANs
Start Tier 1 down, Tier 2 up
Connect Tier 1, wait for link to configure, disconnect Tier 1.

Oct 3 00:47:41    php-fpm    97158    /rc.openvpn: MONITOR: WAN_DHCP is down, omitting from routing group df 192.168.34.194|192.168.34.204|WAN_DHCP|0.224ms|0.035ms|25%|down
Oct 3 00:47:41    php-fpm    97158    /rc.openvpn: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:42    php-fpm    97158    /rc.openvpn: Gateway, switch to:
Oct 3 00:47:42    php-fpm    337    /rc.dyndns.update: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:42    php-fpm    61881    /rc.filter_configure_sync: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:42    php-fpm    337    /rc.dyndns.update: Gateway, switch to:
Oct 3 00:47:42    php-fpm    337    /rc.dyndns.update: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:42    php-fpm    61881    /rc.filter_configure_sync: Gateway, switch to:
Oct 3 00:47:42    php-fpm    61881    /rc.filter_configure_sync: Gateways status could not be determined, considering all as up/active. (Group: df)

Had different results each time, in each of 3 tests there were 2 entries, 5 entries, and 7 entries for that message.

Oct 3 00:47:04    php-fpm    73667    /rc.newwanip: rc.newwanip: Info: starting on igb0.
Oct 3 00:47:04    php-fpm    73667    /rc.newwanip: rc.newwanip: on (IP address: 192.168.34.204) (interface: WAN[wan]) (real interface: igb0).
Oct 3 00:47:04    php-fpm    73667    /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:04    php-fpm    73667    /rc.newwanip: Keep current gateway, its already part of the group members.
Oct 3 00:47:04    php-fpm    73667    /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: df)
Oct 3 00:47:05    php-fpm    73667    /rc.newwanip: 73667MONITOR: WAN_DHCP is available now, adding to routing group df 192.168.34.194|192.168.34.204|WAN_DHCP|0.263ms|0.028ms|0.0%|none
Oct 3 00:47:05    php-fpm    73667    /rc.newwanip: Keep current gateway, its already part of the group members.
Oct 3 00:47:05    php-fpm    97158    /rc.linkup: Keep current gateway, its already part of the group members.

Also seeing keep current gateway

#3 Updated by Steve Beaver 4 months ago

  • Target version changed from 2.4.4-GS to 2.4.4_1

#4 Updated by Steve Beaver 4 months ago

  • Target version changed from 2.4.4_1 to 2.4.4-GS

#5 Updated by Danilo Zrenjanin 4 months ago

Tried to replicate the issue in 2.4.4-RELEASE.

Under logs, I was getting only "Keep current gateway, its already part of the group members" but only a couple of such logs. There wasn't any flood.

Oct 5 22:43:00     php-fpm     353     /rc.openvpn: Keep current gateway, its already part of the group members.
Oct 5 22:43:00     php-fpm     353     /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
Oct 5 22:43:00     php-fpm     90203     /rc.dyndns.update: Keep current gateway, its already part of the group members.
Oct 5 22:43:00     php-fpm     30383     /rc.filter_configure_sync: Keep current gateway, its already part of the group members.
Oct 5 22:43:00     php-fpm     40961     /rc.newwanip: rc.newwanip: Info: starting on ix0.
Oct 5 22:43:00     php-fpm     40961     /rc.newwanip: rc.newwanip: on (IP address: 192.168.1.222) (interface: WAN2[opt1]) (real interface: ix0).

#6 Updated by Flole Systems 4 months ago

Could you please check if you are also experiencing latency spikes on LAN Interface during a reconnect (run a ping to pfsense LAN IP and reconnect? You might be having an issue I am trying to troubleshoot in the Forums for a few days.

#7 Updated by Luiz Souza 2 months ago

  • Target version changed from 2.4.4-GS to 2.4.4_1

#8 Updated by Chris Macmahon 2 months ago

Not sure what the acceptable level of log spam is:

Nov 22 09:56:56 check_reload_status Reloading filter
Nov 22 09:56:55 php-fpm 333 /system_gateways.php: Default gateway setting Interface WAN_DHCP Gateway as default.
Nov 22 09:56:53 php-fpm 31885 /system_gateways.php: 31885MONITOR: WAN_DHCP is available now, adding to routing group break_the_internet 192.168.91.1|192.168.91.195|WAN_DHCP|0.581ms|1.056ms|0.0%|none

Nov 22 09:56:52 check_reload_status Syncing firewall
Nov 22 09:56:04 check_reload_status Updating all dyndns
Nov 22 09:56:04 check_reload_status Reloading filter
Nov 22 09:56:03 php-fpm 333 /system_gateways.php: Default gateway setting Interface OPT1_DHCP Gateway as default.
Nov 22 09:56:02 php-fpm 31885 /system_gateways.php: MONITOR: WAN_DHCP is down, omitting from routing group break_the_internet 192.168.91.1|192.168.91.195|WAN_DHCP|0.409ms|0.59ms|0.0%|force_down

That is a Gateway down and up event cycle.

#9 Updated by Vladimir Lind 2 months ago

Tested on VM HA cluster - CE 2.4.5-DEVELOPMENT (amd64) built on Tue Nov 20 16:55:31 EST 2018:

No "Gateways status could not be determined" events in syslog on gateway failure's (disabled tier1 WAN on primary node then enabled it back) - ran icmp from host on lan behind HA cluster - no latency to LAN or 8.8.8.8 during gw up/down events. Just 1 lost icmp packet on HA failover to secondary as it should be.

Not seeing any issues.

#10 Updated by Renato Botelho 2 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF