Project

General

Profile

Actions

Bug #6311

closed

pfSense 2.3 locking up

Added by Markus Strangl about 8 years ago. Updated almost 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Unknown
Target version:
-
Start date:
05/04/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
Affected Architecture:

Description

Hi pfSense team,
since upgrading from 2.2.6 to 2.3 we have had a series of weird lock-ups on our pfSense clusters.
For no apparent reason, all services suddenly seem to die, i.e. no traffic is routed anymore, the WebGUI isn't reachable, and in all but one case even SSH didn't reply anymore.
The machine doesn't panic or reboot, though.. so we had to pull the plug to reset. As a consequence, there are no crash reports available.
The system seems to be "just alive enough" that no watchdog timer is triggering, and CARP does not initiate a failover.
The only thing shown in the system logs at the time of the lockup is this:
system.log:
Apr 28 22:37:56 kws-fw01 check_reload_status: updating dyndns GW_WAN
Apr 28 22:37:56 kws-fw01 check_reload_status: Restarting ipsec tunnels
Apr 28 22:37:56 kws-fw01 check_reload_status: Restarting OpenVPN tunnels/interfaces
Apr 28 22:37:56 kws-fw01 check_reload_status: Reloading filter
Apr 28 22:37:58 kws-fw01 php-fpm: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use GW_WAN.
Apr 28 22:37:58 kws-fw01 xinetd26075: Starting reconfiguration
Apr 28 22:37:58 kws-fw01 xinetd26075: Swapping defaults
Apr 28 22:37:58 kws-fw01 xinetd26075: readjusting service 6969-udp
Apr 28 22:37:58 kws-fw01 xinetd26075: Reconfigured: new=0 old=1 dropped=0 (services)
- this is the time when our monitoring systems start reporting the infrastructure behind the pfSense going dead -
Apr 28 22:48:13 kws-fw01 sshd7986: Accepted keyboard-interactive/pam for admin from 80.120.61.1 port 29853 ssh2
Apr 28 22:49:21 kws-fw01 reboot: rebooted by admin

There are no further entries in any of the log files between the dropout time and the reboot, so I'm not sure whether this due to no traffic being handled anymore, or the syslog daemon locking up as well.
The machines worked fine with 2.2.6 and have been thoroughly stress tested, so I'm pretty sure no hardware issue is involved.

System info:
SuperMicro Intel Westmere rack boxes, 2 each in HA Cluster with CARP and pfSync, Intel x540 10G network cards (ix driver)

Actions

Also available in: Atom PDF